PCT/G BOO/01 002 

r-, ^ENT COOPERATION TREA'-^ 



From the INTERNATIONAL BUREAU 



PCT 

NOTIFICATION OF ELECTION 
(PCT Rule 61.2) 


To: 

Assistant Commissioner for Patents 
United States Patent and Trademark 
Office 
Box PCT 

Washington, u.L.zuzoi 
ETATS-UNIS D'AMERIQUE 

in its capacity as elected Office 


Date of maifing (day/month/year) 
18 October 2000(18.10.00) 




International application No. 
PCT/GBOO/01002 


Applicant's or agenf s file reference 
P6478WO ATM 


International filing date (day/month/year) 
17 March 2000(17.03.00) 


Priority date (day/month/year) 
17 March 1999(17.03.99) 


Applicant 

UDEN, Mark etal 



1. The designated Office is hereby notified of its election made: 

I X[ in the dennand filed with the International Preliminary Examining Authority on: 
16 August 2000(16.08.00) 

[ I in a notice effecting later election filed with the International Bureau on: 



2. The election | X| was 

I I was not 

made before the expiration of 19 months from the priority date or, where Rule 32 applies, within the time limit under 
Rule 32.2(b). 



The International Bureau of WlPO 


Authorized officer 




34, chemin des Colombettes 


Zakaria EL KHODARY 


1211 Geneva 20, Switzeriand 




Fccsimile No.: (41-22) 740.14.35 


Telephone No.: (41-22) 338.83.38 


Form PCT/IB/331 (July 1992) 


GB0001002 




WORLD INTELLECTUAL PROPERTY ORGAN IZAt!L.^ 
International Bureau 



m 



PCX 

INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) 



(51) International Patent Classification ^ : 

C12N 15/86, 9/00, 9/22, 7/04, 5/10 // 
A61P 31/18 



Al 



(11) International Publication Number: WO 00/55341 

(43) International Publication Date: 21 September 20(X) (21.09.00) 



(21) International Application Number: PCT/GBOO/01002 

(22) International Filing Date: 17 March 2000 (17.03.00) 



(30) Priority Data: 
9906177.2 



17 March 1999 (17.03.99) 



GB 



(71) Applicant (for all designated States except US): OXFORD 

BIOMEDICA (UK) LIMITED [GB/GB]; Medawar Centre, 
The Oxford Science Park. Robert Robinson Avenue, Oxford 
OX4 4GA (GB). 

(72) Inventors; and 

(75) Inventors/Applicants (for US only): UDEN, Mark [GB/GB]; 
Flat 2, Finsbury Park, 17 Sommerfield Road, London 
N4 2JN (GB). MITROPHANOUS, Kyriacos [GR/US]; 85 
Warwick Street, Oxford OX4 ISZ (US). 

(74) Agents: MASCHIO, Antonio et al.; D Young & Co., 21 New 
Fetter Lane, London EC4A IDA (GB). 



(81) Designated States: AE, AG, AL, AM, AT, AU, AZ, BA, BB, 
BG, BR, BY, CA, CH, CN, CR, CU, CZ, DE, DK, DM, 
DZ, EE, ES, FI, GB, GD, GE, GH, GM, HR, HU, ID, IL, 
IN, IS, JP, KE, KG, KP, KR, KZ. LC, LK. LR, LS. LT, LU, 
LV, MA, MD, MG, MK, MN, MW, MX, NO, NZ, PL, PT, 
RO, RU, SD, SE, SG, SI, SK, SL, TJ, TM, TR, TT, TZ, 
UA, UG, US, UZ, VN, YU, ZA, ZW, ARIPO patent (GH, 
GM, KB, LS, MW, SD, SL. SZ, TZ, UG, ZW), Eurasian 
patent (AM, AZ. BY, KG, KZ, MD, RU, TJ. TM), European 
patent (AT, BE, CH, CY, DE, DK, ES, Fl, FR, GB, GR, 
IE, IT, LU, MC, NL, PT, SE), OAPI patent (BF, BJ, CF. 
CG, CI, CM, GA. GN, GW. ML, MR, NE, SN, TD, TG). 



Published 

With international search report. 

Before the expiration of the time limit for amending the 
claims and to be republished in the event of the receipt of 
amendments. 



(54) Title: ANTI-VXRAL VECTORS 
(57) Abstract 



A viral vector production system is provided which system comprises: (i) a viral genome comprising at least one first nucleotide 
sequence encoding a gene product capable of binding to and effecting the cleavage, directly or indirectly, of a second nucleotide sequence, 
or transcription product thereof, encoding a viral polypeptide required for the assembly of viral particles; (ii) a third nucleotide sequence 
encoding said viral polypeptide required for the assembly of the viral genome into viral particles, which third nucleotide sequence has a 
different nucleotide sequence to the second nucleotide sequence such that said third nucleotide sequence, or transcription product thereof, 
is resistant to cleavage directed by said gene product; wherein at least one of the gene products is an external guide sequence capable of 
binding to and effecting the cleavage by RNase P of the second nucleotide sequence. The viral vector production system may be used to 
produce viral particles for use in treating or preventing viral infection. 



* 



t 



FOR THE PURPOSES OF INFORMATION ONLY 



Codes used to identify States party to the PCX on the front pages of pamphlets publishing international applications under the PCT. 



AL 


Albania 


ES 


Spain 


LS 


AM 


Armenia 


FI 


Finland 


LT 


AT 


Austria 


FR 


France 


LU 


AU 


Australia 


GA 


Gabon 


LV 


AZ 


Azerbaijan 


GB 


United Kingdom 


MC 


BA 


Bosnia and Herzegovina 


GE 


Georgia 


MD 


BB 


Barbados 


GH 


Ghana 


MG 


BE 


Belgium 


GN 


Guinea 


MK 


BF 


Burkina Faso 


GR 


Greece 




BG 


Bulgaria 


HU 


Hungary 


ML 


BJ 


Benin 


IE 


Ireland 


MN 


BR 


Brazil 


IL 


Israel 


MR 


BY 


Belarus 


IS 


Iceland 


MW 


CA 


Canada 


IT 


Italy 


MX 


CF 


Central African Republic 


JP 


Japan 


NE 


CC 


Congo 


KE 


Kenya 


NL 


CH 


Switzerland 


KG 


Kyrgyzstan 


NO 


CI 


C6tc d'l voire 


KP 


Democratic People's 


NZ 


CM 


Cameroon 




Republic of Korea 


PL 


CN 


China 


KR 


Republic of Korea 


FT 


CU 


Cuba 


KZ 


Kazaksian 


RO 


cz 


Czech Republic 


LC 


Saint Lucia 


RU 


DE 


Germany 


LI 


Liechtenstein 


SD 


DK 


Denmark 


LK 


Sri Lanka 


SE 


EE 


Estonia 


LR 


Liberia 


SG 



Lesotho 

Lithuania 

Luxembourg 

Latvia 

Monaco 

Republic of Moldova 

Madagascar 

The former Yugoslav 

Republic of Macedonia 

Mali 

Mongolia 

Mauritania 

Malawi 

Mexico 

Niger ■ 

Netherlands 

Norway 

New Zealand 

Poland 

Portugal 

Romania 

Russian Federation 

Sudan 

Sweden 

Singapore 



SI 


Slovenia 


SK 


Slovakia 


SN 


. Senegal 


sz 


Swaziland 


TD 


Chad 


TG 


Togo 


TJ 


Tajikistan 


TM 


Turkmenistan 


TR 


Turkey 


TT 


Trinidad and Tobago 


UA 


Ukraine 


UG 


Uganda 


US 


United Stales of America 


uz 


Uzbekistan 


VN 


Viet Nam 


vu 


Yugoslavia 


zw 


Zimbabwe 



H^TENT COOPERATION TR|^TY 

PCT 

INTERNATIONAL PRELIMINARY EXAMINATION RE 

(PCT Article 36 and Rule 70) 




Applicant's or agent's file reference 
P006478WO ATM 



FOR FURTHER ACTION 



See Notification of Transmittal of International 
Preliminary Examination Report (Form PCT/IPEA/416) 



International application No. 
PCT/GBOO/01002 



Intemational filing date (day/month/year) 
17/03/2000 



Priority date (day/month/year) 
17/03/1999 



Intemational Patent Classification (IPC) or national classification and IPC 
C12N15/86 



Applicant 

OXFORD BIOMEDICA (UK) LIMITED et al. 



1 . This international preliminary examination report has been prepared by this International Preliminary Examining Authority 
and is transmitted to the applicant according to Article 36. 

2. This REPORT consists of a total of 5 sheets, including this cover sheet. 

□ This report is also accompanied by ANNEXES, i.e. sheets of the description, claims and/or drawings which have 
been amended and are the basis for this report and/or sheets containing rectifications made before this Authority 
(see Rule 70.16 and Section 607 of the Administrative Instructions under the PCT). 

These annexes consist of a total of sheets. 



3. This report contains indications relating to the following items: 



Basis of the report 



II 


□ 


III 


□ 


IV 


□ 


V 




VI 


□ 


Vli 


□ 


vm 


□ 



citations and explanations suporting such statement 



vm □ Certain observations on the international application 



Date of submission of the demand 
16/08/2000 



Date of completion of this report 
01 .06.2001 



Name and mailing address of the intemational 
preliminary examining authority: 
European Patent Office 

D-80298 Munich 
Tel. +49 89 2399 - 0 Tx: 523656 epmu d 

Fax: +49 89 2399 - 4465 



Authorized officer 
Winnmer, G 

Telephone No. +49 89 2399 7347 



Form PCT/IPEA/409 (cover sheet) (January 1994) 



INTERNATIONAL PRELIMINARY 
EXAMINATION REPORT 



International application No. PCT/G BOO/01 002 



I. Basis f the rep rt 

1 . With regard to the elements of the international application (Replacement sheets which have been furnished to 
the receiving Office in response to an invitation under Article 14 are referred to in this report as "originally filed" 
and are not annexed to this report since they do not contain amendments (Rules 70. 16 and 70. 1 7)): 
Description, pages: 

1 -30 as originally filed 

Claims, No.: 

1 -23 as originally filed 

Drawings, sheets: 

1/14-14/14 as originally filed 

Sequence listing part of the description, pages: 

1-40 (SEQ ID NOs. 1-73), as originally filed 

2. With regard to the language, all the elements marked above were available or furnished to this Authority in the 
language in which the Internationa! application was filed, unless othenwise indicated under this item. 

These elements were available or furnished to this Authority in the following language: , which is: 

□ the language of a translation furnished for the purposes of the international search (under Rule 23.1 (b)). 

□ the language of publication of the international application (under Rule 48.3(b)). 

□ the language of a translation furnished for the purposes of international preliminary examination (under Rule 
55.2 and/or 55.3). 

3. With regard to any nucleotide and/or amino acid sequence disclosed in the international application, the 
international preliminary examination was carried out ori'the basis of the sequence listing: 

H contained in the international application in written form. 

□ filed together with the intemational application in computer readable form. 

□ furnished subsequently to this Authority in written form. 

H furnished subsequently to this Authority in computer readable form. 

H The statement that the subsequently furnished written sequence listing does not go beyond the disclosure in 

the international application as filed has been furnished. 
H The statement that the information recorded in computer readable form is identical to the written sequence 

listing has been furnished. 

4. The amendments have resulted in the cancellation of: 
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□ the description. pages: 

□ the claims, Nos.: 

□ the drawings, sheets: 

5. □ This report has been established as if (some of) the amendments had not been made, since they have been 
considered to go beyond the disclosure as filed (Rule 70.2(c)): 

(Any replacement sheet containing sucti amendments must be referred to under item 1 and annexed to this 
report.) 



6. Additional observations, if necessary: 



V. Reasoned statement under Article 35(2) with regard to novelty, Inventive step or industrial applicability; 
citations and explanations supporting such statement 

1. Statement 

Novelty (N) Yes: Claims 1-23 

No: Claims 

Inventive step (IS) Yes: Claims 

No: Claims 1-23 

Industrial applicability (lA) Yes: Claims 1-23 

No: Claims 



2. Citations and explanations 
see separate sheet 



Form PCT/lPEA/409 (Boxes l-VIII. Sheet 2) (July 1998) 



INTERNATIONAL PRELIMINARY International application No. PCT/GBOO/01 002 
EXAMINATION REPORT - SEPARATE SHEET 



Re Item V 

Reasoned statement under Art. 35(2) PCT with regard to novelty, inventive step or 
industrial applicability. 

The application does not meet tlie requirements of Art. 33 PCT since claims 1-23 do not 
appear to contain an inventive step. 

1) Reference is made to the following documents (the document numbering 
corresponds to their order of citation in the intemational search report): 

D1 : WO 97 20060 A (UNIV JOHNS HOPKINS MED) 5 June 1997 (1997-06-05) 

D2: YUAN Y ET AL: Targeted cleavage of mRNA by human RNase P' 
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF USA, vol. 
89, no. 89, September 1992 (1992-09), pages 8006-8010, XP002104826 ISSN: 
0027-8424 cited in the application 

Noveltv under Art. 33(2) PCT. 

2) Although the prior art discloses vector systems which utilize ribozymes to directly 
cleave wild-type viral RNA, no documents describe such a system with External Guide 
Sequences to cause indirect cleavage of RNA. The systems, methods and viral 
particles of claims 1 -23 are therefore novel. 

Inventive Step under Art. 33(3) PCT. 

3) Document D1 can be regarded as the closest prior art. In this document, vector 
systems are described in which ribozymes are used to target and cleave viral RNA. 
Specific examples of D1 include HIV vector systems, which include one or more 
ribozymes targeting sequences within the wild-type viral RNA. Furthermore, said 
ribozymes do not cleave within sequences of the described vectors, since the vector 
sequences, although still supporting packaging of the vector, have been altered from 
wild-type sequences so that the ribozyme recognition sites are absent. 
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Through this, the inventions of D1 and that of the present application are different in 
that D1 only describes the use of ribozymes, but not of External Guide Sequences to 
direct cleavage of viral RNA. 

The technical problem at the basis of the present application was therefore to find 
means of targeting and cleaving viral RNA alternative to the use of ribozymes. 

The solution, the use of specifically designed External Guide Sequences, however, 
cannot be viewed as being inventive. 

Document D5 describes the successful construction of External Guide Sequences, 
for directed cleavage of a reporter gene RNA in human cells. Furthemnore, D5 
explains how to construct further EG sequences to cleave any desired RNA, and 
proposes this technique as a "general technique for gene inactivation". The document 
also states that this approach "has an advantage over techniques of gene inactivation 
that involve antisense RNA or other, exogenous ribozymes" (pg. 8010). 
The person skilled in the art would therefore try to use the technique of D5, in the 
method of D1. He would therefore try, with reasonable chance of success, to 
construct sequences for EGS-directed cleavage of viral RNA, preferably for regions 
within the viral genome already targeted by the method of D1 (i.e., the HIV gag-pol 
sequence). The skilled person would use these sequences in place of, or additional 
to, the ribozyme coding sequences within the constructs of D1, and arrive at the 
invention of the present application. 

Consequently, these methods and entities, and accordingly also claims 1-23, lack an 
inventive step. 
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ANTI-VIRAL VECTORS 



PCT/GBOO/01002 



Field of the Invention 

The present invention relates to novel viral vectors capable of delivering anti-viral 
inhibitory RNA molecules to target cells. 

Background to the hivention 

The application of gene therapy to the treatment of AIDS and HIV mfection has been 
discussed widely (Lever, 1995). The types of therapeutic gene proposed usually fall into 
one of two broad categories. In the first the gene encodes protein products that inhibit the 
virus in a number of possible ways. One example of such a protein is the RevMlO 
derivative of the HIV Rev protein. The RevMlO protein acts as a transdominant negative 
mutant and so competitively inhibits Rev function in the virus. Like many of the protein- 
based strategies, the RevMlO protein is a derivative of a native HTV protein. While this 
provides the basis for the anti-HTV effect, it also has serious disadvantages. In particular, 
this type of strategy demands that in the absence of the virus there is little or no expression 
of the gene. Otherwise, healthy cells harbouring the gene become a target for the host 
cytotoxic T lymphocyte (CTL) system, which recognises the foreign protein. The second 
broad category of therapeutic gene circumvents these CTL problems. The therapeutic gene 
encodes inhibitory RNA molecules; RNA is not a target for CTL recognition. 

There are several types of inhibitory RNA molecules known: anti-sense RNA, ribozymes, 
competitive decoys and extemal guide sequences (EGSs). 

External guide sequences, first identified by Forster and Altman (1990), are RNA 
sequences that are capable of directing the cellular protein RNase P to cleave a particular 
RNA sequence. In v/vo, they are foimd as part of precursor tRNAs where they function to 
direct cleavage by the cellular riboprotein RNase P in vivo of the tRNA precursor to form 
mature tRNA. However, in principle, any RNA can be targeted by a custom-designed EGS 
RNA for specific cleavage by RNase P in vitro or in vivo. For example. Yuan et aL (1992) 
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demonstrate a reduction in the levels of chloramphenicol activity in cells in tissue culture 
as a result of introducing an appropriately designed EGS. 

In recent years a number of laboratories have developed retroviral vector systems based on 
5 HIV. In the context of anti-HTV gene therapy these^ vectors have a number of advantages 
over the more conventional murine based vectors such as murine leukaemia virus (MLV) 
vectors. Firstly, HIV vectors v^ould target precisely those cells that are susceptible to HIV 
infection. Secondly, the HIV-based vector would transduce cells such as macrophages that 
are normedly refractory to transduction by murine vectors. Thirdly, the anti-HTV vector 
10 genome would be propagated through die CD4h- cell population by any virus (HTV) that 
escaped the therapeutic strategy. This is because the vector genome has the packaging 
signal that will be recognised by the viral particle packaging system. These various 
attributes make HlV-vectors a powerful tool in the field of anti-HTV gene therapy. 

15 A combination of inhibitory RNA molecules and an HIV-based vector would be attractive 
as a therapeutic strategy. However, imtil now this has not been possible. Vector particle 
production takes place in producer cells which express the pack^ing components of the 
particles and package the vector genome. The inhibitory RNA sequences that are designed 
to destroy the viral RNA would therefore also intermpt the expression of the components 

20 of the HIV-based vector system during vector production. The present invention aims to 
overcome this problem. 

Summarv of the Invention 

25 It is therefore an object of the invention to provide a system and method for producing viral 
particles, in particular HTV particles, which carry nucleotide constmcts encoding inhibitory 
RNA molecules such as external guide sequences, optionally together with other classes of 
inhibitory RNA molecules such as ribozymes and/or antisense RNAs directed against a 
corresponding virus, such as HIV, within a target cell, that overcomes the above-mentioned 

30 problems. The system includes both a viral genome encoding the inhibitory RNA molcules 
and nucleotide constructs encoding the components required for packaging the viral 
genome in a producer cell. However, in contrast to the prior art, although the packaging 
components have substantially the same amino acid sequence as the corresponding 
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components of the target virus, the inhibitory RNA molecules do not affect production of 
the viral particles in the producer cells because the nucleotide sequence of the packaging 
components used in the viral system have been modified to prevent the inhibitory RNA 
molecules from effecting cleavage or degradation of the RNA transcripts produced from 
5 the constructs. Such a viral particle may be used to treat viral infections, in particular HIV 
infections. 

Accordingly the present invention provides a viral vector system comprising: 

(i) a first nucleotide sequence encoding an extemal guide sequence capable of binding 
10 to and effecting the cleavage by RNase P of a second nucleotide sequence, or transcription 

product thereof, encoding a viral polypeptide required for the assembly of viral particles; 
and 

(ii) a third nucleotide sequence encoding said viral polypeptide required for the 
assembly of viral particles, which third nucleotide sequence has a different nucleotide 

15 sequence to the second nucleotide sequence such that the third nucleotide sequence, or 
transcription product thereof, is resistant to cleavage directed by the extemal guide 
sequence. 

Preferably, said system further comprises at least one further first nucleotide sequence 
20 encoding a gene product capable of binding to and effecting the cleavage, directly or 
indirectly, of a second nucleotide sequence, or transcription product thereof, encoding a 
viral polypeptide required for the assembly of viral particles, wherein the gene product is 
selected from an extemal guide sequence, a ribozyme and an anti-sense ribonucleic acid. 

25 In another aspect, the present invention provides a viral vector production system 
comprising: 

(i) a viral genome comprising at least one first nucleotide sequence encoding a gene 
product capable of binding to and effecting the cleavage, directly or indirectly, of a second 
nucleotide sequence, or transcription product thereof, encoding a viral polypeptide required 

30 for the assembly of viral particles; 

(ii) a third nucleotide sequence encoding said viral polypeptide required for the 
assembly of the viral genome into viral particles, which third nucleotide sequence has a 
different nucleotide sequence to the second nucleotide sequence such that said third 
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nucleotide sequence, or transcription product thereof, is resistant to cleavage directed by 
said gene product; 

wherein at least one of the gene products is an external guide sequence capable of binding 
to and effecting the cleavage by RNase P of the second nucleotide sequence. 

5 

Preferably, in addition to an external guide sequence, at least one gene product is selected 
from a ribozyme and an anti-sense ribonucleic acid, preferably a ribozyme. 

Preferably, the viral vector is a retroviral vector, more preferably a lentiviral vector, such as 
10 an HIV vector. The second nucleotide sequence and the third nucleotide sequences are 
typically from the same viral species, more preferably from the same viral strain. 
Generally, the viral genome is also from the same viral species, more preferably from the 
same viral strain. 

15 In the case of retroviral vectors, the polypeptide required for the assembly of viral particles 
is selected from gag, pol and env proteins. Preferably at least the gag and pol sequences 
are lentiviral sequences, more preferably HIV sequences. Alternatively, or ia addition, the 
env sequence is a lentiviral sequence, more preferably an HIV sequence. 

20 In a preferred embodiment, the third nucleotide sequence is resistant to cleavage directed 
by the gene product as a result of one or more conservative alterations in the nucleotide 
sequence which remove cleavage sites recognised by the at least one gene product and/or 
binding sites for the at least one gene product. For example, where the gene product is an 
EGS, the third nucleotide sequence is adapted to prevent EOS binding and/or to remove the 

25 RNase P consensus cleavage site. Alternatively, where the gene product is a ribozyme, the 
third nucleotide sequence is adapted to be resistant to cleavage by the ribozyme. 

Preferably the third nucleotide sequence is codon optimised for expression in host cells. 
The host cells, which term includes producer cells and packaging cells, are typically 
30 mammalian cells. 



In a particularly preferred embodiment, (i) the viral genome is an HTV genome comprising 
nucleotide sequences encoding anti-HTV EGSs and optionally anti-HIV ribozyme 
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sequences directed against HIV packaging component sequences (such as gag.pol) in a 
target HTV and (ii) the viral system for producing packaged HIV particles further comprises 
nucleotide constructs encoding the same packaging components (such as gag.pol proteins) 
as in the target HTV wherein the sequence of the nucleotide constructs is different from that 
5 found in the target HIV so that the anti-HIV EGS and anti-HIV ribozyme sequences cannot 
effect cleavage or degradation of the gag.pol transcripts during production of the HIV 
particles in producer cells. 

The present invention also provides a viral particle comprising a viral vector according to 
10 the present invention and one or more polypeptides encoded by the third nucleotide 
sequences according to the present invention. For example the present invention provides 
a viral particle produced using the viral vector production system of the invention. 

In another aspect, the present invention provides a method for producing a viral particle 
15 which method comprises introducing into a host cell (i) a viral genome vector according to 
the present invention; (ii) one or more third nucleotide sequences according to the present 
invention: and (iii) nucleotide sequences encoding the other essential viral packaging 
components not encoded by the one or more third nucleotide sequences. 

20 The present invention further provides a viral particle produced using by the method of the 
invention. 

The present invention also provides a pharmaceutical composition comprising a viral 
particle according to the present invention together with a pharmaceutically acceptable 
25 carrier or diluent. 

The viral system of the invention or viral particles of the invention may be used to treat 
viral infections, particularly retroviral infections such as lentiviral infections including HTV 
infections. Thus the present invention provides a method of treating a viral infection which 
30 method comprises administering to a human or animal patient suffering from the viral 
infection an effective amoxmt of a viral system, viral particle or pharmaceutical 
composition of the present invention. 
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The invention relates in particular to HTV-based vectors carrying anti-HIV EGSs. 
However, the invention can be applied to any other virus, in particular any other lentivirus, 
for which treatment by gene therapy may be desirable. The invention is illustrated herein 
for HIV, but this is not considered to limit the scope of the invention to HIV -based anti- 
5 HIV vectors. 

Detailed Description of the Invention 

The term '"viral vector" refers to a nucleotide construct comprising a viral genome capable 
10 of being transcribed in a host cell, which genome comprises sufficient vu-al genetic 
information to allow packaging of the viral RNA genome, in the presence of packaging 
components, into a viral particle capable of infecting a target cell. Infection of the target 
cell includes reverse transcription and integration into the target cell genome, where 
appropriate for particular viruses. The viral vector in use typically carries heterologous 
15 coding sequences (nucleotides of interest) which are to be delivered by the vector to the 
target cell, for example a first nucleotide sequence encoding an EGS. A viral vector is 
incapable of independent replication to produce infectious viral particles within the final 
target cell. 

20 The term " viral vector system" is intended to mean a kit of parts which can be used when 
combined with other necessary components for viral particle production to produce viral 
particles in host cells. For example, the first nucleotide sequence may typically be present 
in a plasmid vector constmct suitable for cloning the first nucleotide sequence into a viral 
genome vector construct. When combined in a kit with a third nucleotide sequence, which 

25 will also typically be present in a separate plasmid vector construct, the resulting 
combination of plasmid containing the first nucleotide sequence and plasmid containing 
the third nucleotide sequence comprises the essential elements of the invention. Such a kit 
may then be used by the skilled person in the production of suitable viral vector genome 
constructs which when transfected into a host cell together with the plasmid containing the 

30 third nucleotide sequence, and optionally nucleic acid constmcts encoding other 
components required for viral assembly, will lead to the production of infectious viral 
particles. 
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Altematively, the third nucleotide sequence may be stably present within a packaging cell 
line that is included in the kit. 

The kit may include the other components needed to produce viral particles, such as host 
cells and other plasmids encoding essential viral polypeptides required for viral assembly. 
By way of example, the kit may contain (i) a plasmid containing a first nucleotide sequence 
encoding an anti-HIV EGS and (ii) a plasmid containing a third nucleotide sequence 
encoding a modified HTV gag.pol construct which cannot be cleaved by the anti-HTV 
ribozyme. Optional components would then be (a) an HIV viral genome construct with 
suitable restriction enzyme recognition sites for cloning the first nucleotide sequence into 
the viral genome; (b) a plasmid encoding a VSV-G env protein. Alternatively, nucleotide 
sequence encoding viral polypeptides required for assembly of viral particles may be 
provided in the kit as packaging cell Unes comprising the nucleotide sequences, for 
example a VSV-G expressing cell line. 

The term 'Viral vector production system" refers to the viral vector system described above 
wherein the first nucleotide sequence has akeady been inserted into a suitable viral vector 
genome. 

Viral vectors are typically retroviral vectors, in particular lentiviral vectors such as HTV 
vectors. The retroviral vector of the present invention may be derived firom or may be 
derivable from any suitable retrovirus. A large number of different retroviruses have been 
identified. Examples include: murine lexikemia virus (MLV), himian immunodeficiency 
virus (HTV), simian inmiunodeficiency vims, human T-cell leukemia virus (HTLV). 
equine infectious anaemia virus (EIAV), mouse mammary tumour vims (MMTV), Rous 
sarcoma virus (RSV), Fujinami sarcoma virus (FuSV), Moloney murine leukemia virus 
(Mo-MLV), FBR murine osteosarcoma virus (FBR MSV), Moloney murine sarcoma virus 
(Mo-MSV), Abelson murine leukemia virus (A-MLV), Avian myelocytomatosis virus-29 
(MC29), and Avian erythroblastosis virus (AEV). A detailed list of retrovimses may be 
found m Coffin et al, 1997, "Retroviruses", Cold Spring Harbour Laboratory Press Eds: 
JM Coffin, SM Hughes, HE Varmus pp 758-763. 
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Details on the genomic structure of some retroviruses may be found in the art. By way of 
example, details on HIV and Mo-MLV may be found from the NCBI Genbank (Genome 
Accession Nos. AF033819 and AF03381 1, respectively). 

5 The lentivirus group can be split even further into "primate" and "non-primate". Examples 
of primate lentiviruses include himian inMnunodeficiency virus (HIV), the causative agent 
of human auto-immunodeficiency syndrome (AIDS), and simian immimodeficiency virus 
(SIV). The non-primate lentiviral group includes the prototype "slow virus" visna/maedi 
virus (VMV), as well as the related caprine arthritis-encephalitis virus (CAEV), equine 
10 infectious anaemia virus (EIAV) and the more recently described feline immunodeficiency 
virus (FIV) and bovine immimodeficiency virus (BIV). 



The basic structure of a retrovirus genome is a 5' LTR and a 3' LTR, between or within 
which are located a packaging signal to enable the genome to be packaged, a primer 
15 binding site, integration sites to enable integration into a host cell genome and gag, pol and 
env genes encoding the packaging components - these are polypeptides required for the 
assembly of viral particles. More complex retroviruses have additional features, such as 
rev and RRE sequences in HTV, which enable the efficient export of RNA transcripts of the 
integrated provirus firom the nucleus to the cytoplasm of an infected target ceil. 

20 

In the provirus, these genes are flanked at both ends by regions called long terminal repeats 
(LTRs). The LTRs are responsible for proviral integration, and transcription, LTRs also 
serve as enhancer-promoter sequences and can control the expression of the viral genes. 
Encapsidation of the retroviral RNAs occurs by virtue of a psi sequence located at the 5' 
25 end of the viral genome. 



The LTRs themselves are identical sequences that can be divided into three elements, 
which are called U3, R and U5. U3 is derived from the sequence unique to the 3' end of 
the RNA. R is derived from a sequence repeated at both ends of the RNA and U5 is 
30 derived from the sequence unique to the 5' end of the RNA. The sizes of the three 
elements can vary considerably among different retroviruses. 
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In a defective retroviral vector genome gag, pol and env may be absent or not functional. 
The R regions at both ends of the RNA are repeated sequences. U5 and U3 represent 
unique sequences at the 5' and 3' ends of the RNA genome respectively. 

5 In a typical retroviral vector for use in gene therapy, at least part of one or more of the gag, 
pol and env protein coding regions essential for replication may be removed from the virus. 
This makes the retroviral vector replication-defective. The removed portions may even be 
replaced by a nucleotide sequence of interest (NOI), such as a first nucleotide sequence of 
the invention, to generate a virus capable of integrating its genome into a host genome but 

10 wherein the modified viral genome is imable to propagate itself due to a lack of stmctural 
proteins. When integrated in the host genome, expression of the NOI occurs - resulting in, 
for example, a therapeutic and/or a diagnostic effect. Thus, the transfer of an NOI into a 
site of interest is typically achieved by: integrating the NOI into the recombinant viral 
vector; packaging the modified viral vector into a virion coat; and allowing transduction of 

15 a site of interest - such as a targeted cell or a targeted cell population. 

A minimal retroviral genome for use in the present invention will therefore comprise (5') R 
- U5 - one or more first nucleotide sequences - U3-R (3'). However, the plasmid vector 
used to produce the retroviral genome within a host cell/packaging cell will also include 
20 transcriptional regulatory control sequences operably linked to the retroviral genome to 
direct transcription of the genome in a host cell/packaging cell. These regulatory 
sequences may be the natural sequences associated with the transcribed retroviral sequence, 
i.e. the 5' U3 region, or they may be a heterologous promoter such as another viral 
promoter, for example the CMV promoter. 

25 

Some retroviral genomes require additional sequences for efficient virus production. For 
example, in the case of HTV, rev and RRE sequence are preferably included. However the 
requirement for rev and RRE can be reduced or eliminated by codon optimisation. 

30 Once the retroviral vector genome is integrated into the genome of its target cell as proviral 
DNA, the ribozyme sequences need to be expressed. In a retrovirus, the promoter is 
located in the 5' LTR U3 region of the provirus. In retroviral vectors, the promoter driving 
expression of a therapeutic gene may be the native retroviral promoter in the 5' U3 region. 
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or an alternative promoter engineered into the vector. The alternative promoter may 
physically replace the 5' U3 promoter native to the retrovirus, or it may be incorporated at 
a different place within the vector genome such as between the LTRs. 

5 Thus, the first nucleotide sequence will also be operably linked to a transcriptional 
regulatory control sequence to allow transcription of the first nucleotide sequence to occur 
in the target cell. The control sequence will typically be active in mammalian cells. The 
control sequence may, for example, be a viral promoter such as the natural viral promoter 
or a CMV promoter or it may be a mammalian promoter. It is particularly preferred to use 
10 a promoter that is preferentially active m a particular cell type or tissue type in which the 
virus to be treated primarily infects. Thus, in one embodiment, a tissue-specific regulatory 
sequences may be used. The regulatory control sequences driving expression of the one or 
more first nucleotide sequences may be constitutive or regulated promoters. 

15 Replication-defective retroviral vectors are typically propagated, for example to prepare 
suitable titres of the retroviral vector for subsequent transduction, by using a combination 
of a packaging or helper cell line and the recombinant vector. That is to say, that the three 
packaging proteins can be provided in trans. 

20 A "packaging cell line" contains one or more of the retroviral gag, pol and env genes. The 
packaging cell line produces the proteins required for packaging retroviral DNA but it 
carmot bring about encapsidation due to the lack of a psi region. However, when a 
recombinant vector carrying an NOI and a psi region is introduced into the packaging cell 
line, the helper proteins can package the /75/-positive recombinant vector to produce the 

25 recombinant virus stock. This virus stock can be used to transduce cells to introduce the 
NOI into the genome of the target cells. It is preferred to use a psi packaging signal, called 
psi plus, that contains additional sequences spanning firom upstream of the splice donor to 
downstream of the gag start codon (Bender et al, 1987) since this has been shown to 
increase viral titres. 

30 

The recombinant virus whose genome lacks all genes required to make viral proteins can 
tranduce only once and cannot propagate. These viral vectors which are only capable of a 
single roimd of transduction of target cells are known as replication defective vectors. 
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Hence, the NOI is introduced into the host/target cell genome without the generation of 
potentially harmful retrovirus. A simunary of the available packaging lines is presented in 
Coffin etal, 1997 {ibid), 

5 Retroviral packaging cell lines in which the gag, pol and env viral coding regions are 
carried on separate expression plasmids that are independently transfected into a packaging 
cell line are preferably used. This strategy, sometimes referred to as the three plasmid 
transfection method (Soneoka et ai, 1995) reduces the potential for production of a 
replication-competent virus since three recombinant events are required for wild type viral 

10 production. As recombination is greatly facilitated by homology, reducing or eliminating 
homology between the genomes of the vector and the helper can also be used to reduce the 
problem of replication-competent helper virus production. 



An alternative to stably transfected packaging cell lines is to use transiently transfected cell 
15 lines. Transient transfections may advantageously be used to measure levels of vector 
production when vectors are being developed. In this regard, transient transfection avoids 
the longer time required to generate stable vector-producing cell lines and may also be used 
if the vector or retroviral packaging components are toxic to cells. Components typically 
used to generate retroviral vectors include a plasmid encoding the gag/pol proteins, a 
20 plasmid encoding the env protein and a plasmid containing an NOI. Vector production 
involves transient transfection of one or more of these components into cells containing the 
other required components. If the vector encodes toxic genes or genes that interfere with 
the replication of the host cell, such as inhibitors of the cell cycle or genes that induce 
apotosis, it may be difficult to generate stable vector-producing cell lines, but transient 
25 transfection can be used to produce the vector before the cells die. Also, cell lines have 
been developed using transient transfection that produce vector titre levels that are 
comparable to the levels obtained from stable vector-producing cell lines (Pear et al., 
1993). 



30 Producer cells/packaging cells can be of any suitable cell type. Most commonly, 
mammalian producer cells are used but other cells, such as insect cells are not excluded. 
Clearly, the producer cells will need to be capable of efficiently translating the env and gag, 
pel mRNA. Many suitable producer/packaging cell lines are known in the art. The skilled 
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person is also capable of making suitable packaging cell lines by, for example stably 
introducing a nucleotide construct encoding a packaging component into a cell line. 



As will be discussed below, where the retroviral genome encodes an inhibitory RNA 
5 molecule capable of effecting the cleavage of gag, pol and/or env RNA transcripts, the 
nucleotide sequences present in the packaging cell line, either integrated or carried on 
plasmids, or in the transiently transfected producer cell line, which encode gag, pol and or 
env proteins will be modified so as to reduce or prevent binding of the inhibitory RNA 
molecule(s). hi this way, the inhibitory RNA moiecule(s) will not prevent expression of 
10 components in packaging cell lines that are essential for packaging of viral particles. 

It is highly desirable to use high-titre virus preparations in both experimental and practical 
applications. Techniques for increasing viral titre include using a psi plus packaging signal 
as discussed above and concentration of viral stocks, hi addition, the use of different 
15 envelope proteins, such as the G protein from vesicular-stomatitis virus has improved titres 
following concentration to 10^ per ml (Cosset et aL, 1995). However, typically the 
envelope protein will be chosen such that the viral particle will preferentially infect cells 
that are infected with the virus which it desired to treat. For example where an HTV vector 
is being used to treat HTV infection, the env protein used will be the HTV env protein. 

20 

Suitable first nucleotide sequences for use according to the present invention encode gene 
products that result in the cleavage and/or enzymatic degradation of a target nucleotide 
sequence, which will generally be a ribonucleotide. As particular examples, EGSs, 
ribozymes, and antisense sequences may be mentioned, more specifically EGSs. 

25 

External guide sequences (EGSs) are RNA sequences that bind to a complementary target 
sequence to form a loop in the target RNA sequence, the overall structure being a substrate 
for RNaseP-mediated cleavage of the target RNA sequence. The structure that forms when 
the EGS aimesds to the target RNA is very similar to that found in a tRNA precursor. The 
30 the natural activity of RNaseP can be directed to cleave a target RNA by designing a 
suitable EGS. The general rules for EGS design are as follows, with reference to the 
generic EGSs shown in Figure 9B: 
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Rules for EGS design in mammalian cells (see Figure 9B) 

Target sequence - All tRNA precursor molecules have a G immediately 3* of the RNaseP 
cleavage site (i.e. the G forms a base pair with the C at the top of the acceptor stem prior to 
5 the ACCA sequence). In addition a U is found 8 nucleotides downstream in all tRNAs. 
(i.e. G at position 1, U at position 8). A pyrimidine may be preferred 5' of the cut site. No 
other specific target sequences are required. 



EGS sequence - A 7 nucleotide 'acceptor stem' analogue is optimal (5' hybridising arm). 
10 A 4 nucleotide 'D-stem' analogue is preferred (3' hybridising arm). Variation in this 
length may alter the reaction kinetics. This will be specific to each target site. A consensus 
'T-stem and loop' analogue is essential. Minimal 5' and 3' non-pairing sequences are 
preferred to reduce the potential for undesired folding of the EGS RNA. 

15 Deletion of the 'anti-codon stem and loop' analogue may be beneficial. Deletion of the 
variable loop can also be tolerated in vitro but an optimal replacement loop for the deletion 
of both has not been defined in vivo. 



As with ribozymes, described below, it is preferred to use more than one EGS. Preferably, 
20 a plurality of EGSs is employed, together capable of cleaving gag, pol and env RNA of the 
native retrovirus at a plurality of sites. Since HIV exists as a population of quasispecies, 
not all of the target sequences for the EGSs will be included in all HTV variants. The 
problem presented by this variability can be overcome by using multiple EGs. Multiple 
EGSs can be included in series in a single vector and can fimction independently when 
25 expressed eis a single RNA sequence. A single RNA containing two or more EGSs having 
different target recognition sites may be referred to as a multitarget EGS. 

Further guidance may be obtained by reference to, for example, Werner et al, (1997); 
Werner et al. (1998); Ma et al, (1998) and Kawa et al (1998). 

30 

Ribozymes are RNA enzymes which cleave RNA at specific sites. Ribozymes can be 
engineered so as to be specific for any chosen sequence containing a ribozyme cleavage 
site. Thus, ribozymes can be engineered which have chosen recognition sites in transcribed 
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viral sequences. By way of an example, ribozymes encoded by the first nucleotide 
sequence recognise and cleave essential elements of viral genomes required for the 
production of viral particles, such as packaging components. Thus, for retroviral genomes, 
such essential elements include the gag, pol and env gene products. A suitable ribozyme 
5 capable of recognising at least one of the gag, pol and env gene sequences, or more 
typically, the RNA sequences transcribed firom these genes, is able to bind to and cleave 
such a sequence. This will reduce or prevent production of the gal, pol or env protein as 
appropriate and thus reduce or prevent the production of retroviral particles. 



10 Ribozymes come in several forms, including hammerhead, hairpin and hepatitis delta 
antigenomic ribozymes. Preferred for use herein are hammerhead ribozymes, in part 
because of their relatively small size, because the sequence requirements for their target 
cieav^e site are minimal and because they have been well characterised. The ribozymes 
most commonly used in research at present are hammerhead £ind hairpin ribozymes. 

15 

Each individual ribozyme has a motif which recognises and binds to a recognition site in 
the target RNA. This motif takes the form of one or more "binding arms", generally two 
binding arms. The binding arms in hammerhead ribozymes are the flanking sequences 
Helix I and Helix IQ, which flank Helix n. These can be of variable length, usually 

20 between 6 to 10 nucleotides each, but can be shorter or longer. The length of the flanking 
sequences can ajffect the rate of cleavage. For example, it has been foimd that reducing the 
total number of nucleotides in the flanking sequences from 20 to 12 can increase the 
turnover rate of the ribozyme cleaving a HTV sequence, by 10-fold (Goodchild et aL, 
1991). A catalytic motif in the ribozyme Helix n in hammerhead ribozymes cleaves the 

25 target RNA at a site which is referred to as the cleavage site. Whether or not a ribozyme 
will cleave any given RNA is determined by the presence or absence of a recognition site 
for the ribozyme containing an appropriate cleavage site. 



Each type of ribozyme recognises its own cleavage site. The hanmierhead ribozyme 
30 cleavage site has the nucleotide base triplet GUX directly upstream where G is guanine, U 
is uracil and X is any nucleotide base. Hairpin ribozymes have a cleavage site of 
BCUGNYR, where B is any nucleotide base other than adenine, N is any nucleotide, Y is 
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cytosine or thymine and R is guanine or adenine. Cleavage by hairpin ribozymes takes 
places between the G and the N in the cleavage site. 

The nucleic acid sequences encoding the packaging components (the "third nucleotide 
5 sequences") may be resistant to the ribozyme or ribozymes because they lack any cleavage 
sites for the ribozyme or ribozymes. This prohibits enzymatic activity by the ribozyme or 
ribozymes and therefore there is no effective recognition site for the ribozyme or 
ribozymes. Alternatively or additionally, the potential recognition sites may be altered in 
the flanking sequences which form the part of the recognition site to which the ribozyme 
10 binds. This either eliminates binding of the ribozyme motif to the recognition site, or 
reduces binding capability enough to destabilise any ribozyrne-target complex and thus 
reduce the specificity and catalytic activity of the ribozyme. Where the flanking sequences 
only are altered, they are preferably altered such that catalytic activity of the ribozyme at 
the altered target sequence is negligible and is effectively eliminated. 

15 

Preferably, a series of several anti-HTV ribozymes is employed in the invention. These can 
be any anti-HIV ribozymes but must include one or more which cleave the RNA that is 
required for the expression of gag, pol or em. Preferably, a plurality of ribozymes is 
employed, together capable of cleaving gag, pol and env RNA of the native retrovirus at a 

20 plurality of sites. Since HTV exists as a population of quasispecies, not all of the target 
sequences for the ribozymes will be included in all HIV variants. The problem presented 
by this variability can be overcome by using multiple ribozymes. Multiple ribozymes can 
be included in series in a single vector and can function independently when expressed as a 
single RNA sequence. A single RNA containing two or more ribozymes having different 

25 target recognition sites may be referred to as a multitarget ribozyme. The placement of 
ribozymes in series has been demonstrated to enhance cleavage. The use of a plurality of 
ribozymes is not limited to treating HIV infection but may be used in relation to other 
viruses, retroviruses or otherwise. 



30 Antisense technology is well known on the art. There are various mechanisms by which 
antisense sequences are believed to inhibit gene expression. One mechanism by which 
antisense sequences are believed to function is the recruitment of the cellular protein 
RNaseH to the target sequence/antisense constmct heteroduplex which results in cleavage 
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and degradation of the heteroduplex. Thus the antisense construct, by contrast to 
ribozymes, can be said to lead indirectly to cleavage/degradation of the target sequence. 
Thus according to the present invention, a first nucleotide sequence may encode an 
antisense RNA that binds to either a gene encoding an essential/packaging component or 
5 the RNA transcribed from said gene such that expression of the gene is inhibited, for 
example as a result of RNaseH degradation of a resulting heteroduplex. It is not necessary 
for the antisense construct to encode the entire complementary sequence of the gene 
encoding an essential/packaging component - a portion may suffice. The skilled person 
will easily be able to determine how to design a suitable antisense construct. 

10 

By contrast, the nucleic acid sequences encoding the essential/packaging components of 
the viral particles required for the assembly of viral particles in the host cells/producer 
cells/packaging cells (the third nucleotide sequences) are resistant to the inhibitory RNA 
molecules encoded by the first nucleotide sequence. For example in the case of ribo2ymes, 
15 resistance is typically by virtue of alterations in the sequences which eliminate the 
ribozyme recognition sites. At the same time, the amino acid coding sequence for the 
essential/packaging components is retained so that the viral components encoded by the 
sequences remain the same, or at least sufficiently similar that the function of the 
essential/packaging components is not compromised. 

20 

The term "viral polypeptide required for the assembly of viral particles'* means a 
polypeptide normally encoded by the viral genome to be packaged into viral particles, in 
the absence of which the viral genome cannot be packaged. For example, in the context of 
retroviruses such polypeptides would include gag, pol and env. The terms "packaging 
25 component" and "essential component" are also included within this definition. 

In the case of antisense sequences, the third nucleotide sequence differs firom the second 
nucleotide sequence encoding the target viral packaging component antisense sequence to 
the extent that although the antisense sequence can bind to the second nucleotide sequence, 
30 or transcript thereof, the antisense sequence can not bind effectively to the third nucleotide 
sequence or RNA transcribed from therefrom. The changes between the second and third 
nucleotide sequences will typically be conservative changes, although a small number of 
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amino acid changes may be tolerated provided that, as described above, the function of the 
essential/packaging components is not significantly impaired. 



Preferably, in addition to eliminating the inhibitory RNA recognition sites, the alterations 
to the coding sequences for the viral components improve the sequences for codon usage in 
the mammalian cells or other cells which are to act as the producer cells for retroviral 
vector particle production. This improvement in codon usage is referred to as "codon 
optimisation". Many viruses, including HIV and other lentiviruses, use a large number of 
rare codons and by changing these to correspond to commonly used mammalian codons, 
increased expression of the packaging components in mammalian producer cells can be 
achieved. Codon usage tables are known in the art for manmialian cells, as well as for a 
variety of other organisms. 

Thus preferably, the sequences encoding the packaging components are codon optimised. 
More preferably, the sequences are codon optimised in their entirety. Following codon 
optimisation, it is found that there are numerous sites in the wild type gag, pol and env 
sequences which can serve as inhibitory RNA recognition sites and which are no longer 
present in the sequences encoding the packaging components, hi an altemative but less 
practical strategy, the sequences encoding the packaging components can be altered by 
targeted conservative alterations so as to render them resistant to selected inhibitory RNAs 
capable of effecting the cleavage of the wild type sequences. 



An additional advantage of codon optimising HIV packaging components is that this can 
increase gene expression. In particular, it can render gag, pol expression Rev independent 
so that rev and RRE need not be included in the genome (Haas et a/., 1996). Rev- 
independent vectors are therefore possible. This in turn enables the use of anti-rev or RRE 
factors in the retroviral vector. 

As described above, the packaging components for a retroviral vector include expression 
products of gag, pol and env genes. In accordance with the present invention, gag and pol 
employed in the packaging system are derived fi-om the target retrovirus on which the 
vector genome is based. Thus, in the RNA transcript form, gag and pol would normally be 
cleavable by the ribozymes present in the vector genome. The env gene employed in the 
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packaging system may be derived from a different virus, including other retroviruses such 
as MLV and non-retroviruses such as VSV (a Rhabdovirus), in which case it may not need 
any sequence alteration to render it resistant to cleavage effected by the inhibitory RNA(s). 
Alternatively, env may be derived from the same retrovirus as gag and pol, in which case 
5 any recognition sites for the inhibitory RNA(s) will need to be eliminated by sequence 
alteration. 



The process of producing a retroviral vector in which the envelope protein is not the native 
envelope of the retrovirus is known as "pseudotyping". Certain envelope proteins, such as 
10 MLV envelope protein and vesicular stomatitis virus G (VSV-G) protein, pseudotype 
retroviruses very well. Pseudotyping can be useful for altering, the target cell range of the 
retrovirus. Aitematively, to maintain target cell specificity for target cells infected with the 
particular virus it is desired to treat, the envelope protein may be the same as that of the 
target virus, for example HIV. 

15 

Other therapeutic coding sequences may be present along with the first nucleotide sequence 
or sequences. Other therapeutic coding sequences include, but are not limited to, 
sequences encoding cytokines, hormones, antibodies, inraiunoglobulin fusion proteins, 
enzymes, immune co-stimulatory molecules, anti-sense RNA, a transdominant negative 
20 mutant of a target protein, a toxin, a conditional toxin, an antigen, a single chain antibody, 
tumour suppresser protein and growth factors. When included, such coding sequences are 
operatively linked to a suitable promoter, which may be the promoter driving expression of 
the first nucleotide sequence or a different promoter or promoters. 

25 Thus the invention comprises two components. The first is a genome construction that will 
be packaged by viral packaging components and which carries a series of anti-viral 
inhibitory RNA molecules such as anti-HIVEGs. These could be any anti-HIV EGSs but 
the key issue for this invention is that some of them result in cleavage of RNA that is 
required for the expression of native or wild type HTV gag, pol or env coding sequences. 

30 The second component is the packaging system which comprises a cassette for the 
expression of HIV gag, pol and a cassette either for HTV env or an envelope gene encoding 
a pseudotyping envelope protein - the packaging system being resistant to the inhibitory 
RNA molecules. 
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The viral particles of the present invention, and the viral vector system and methods used 
to produce may thus be used to treat or prevent viral infections, preferably retroviral 
infections, in particular lentiviral, especially HIV, infections. Specifically, the viral 
5 particles of the invention, typically produced using the viral vector system of the present 
invention may be used to deliver inhibitory RNA molecules to a human or animal in need 
of treatment for a viral infection. 

Alternatively, or in addition, the viral production system may be used to transfect cells 
10 obtained from a patient ex vivo and then retumed to the patient. Patient cells transfected ex 
vivo may be formulated as a pharmaceutical composition (see below) prior to 
readministration to the patient. 

Preferably the viral particles are combined with a pharmaceuticedly acceptable carrier or 
15 diluent to produce a pharmaceutical composition. Thus, the present invention also provides 
a pharmaceutical composition for treating an individual, wherein the composition 
comprises a therapeutically effective amoimt of the viral particle of the present invention, 
together with a pharmaceutically acceptable carrier, diluent, excipient or adjuvant. The 
pharmaceutical composition may be for human or animal usage. 

20 

The choice of pharmaceutical carrier, excipient or diluent can be selected with regard to the 
intended route of administration and standard pharmaceutical practice. Suitable carriers and 
diluents include isotonic saline solutions, for example phosphate-buffered saline. The 
pharmaceutical compositions may comprise as - or in addition to - the carrier, excipient or 
25 diluent any suitable binder(s), lubricant(s), suspending agent(s), coating agent(s), 
solubilising agent(s), and other carrier agents that may aid or increase the viral entry into 
the target site (such as for example a lipid delivery system). 

The pharmaceutical composition may be formulated for parenteral, intramuscular, 
30 intravenous, intracranial, subcutaneous, oral, intraocular or transdermal administration. 

Where appropriate, the pharmaceutical compositions can be administered by any one or 
more of: inhalation, in the form of a suppository or pessary, topically in the form of a 



wo 00/55341 PCT/GBOO/01002 

-20- 

lotion, solution, cream, ointment or dusting powder, by use of a skin patch, orally in the 
form of tablets containing excipients such as starch or lactose, or in capsules or ovules 
either alone or in admixture with excipients, or in the form of elixirs, solutions or 
suspensions containing flavouring or colouring agents, or they can be injected parenterally, 
5 for example intracavemosally, intravenously, intramuscularly or subcutaneously. For 
parenteral administration, the compositions may be best used in the form of a sterile 
aqueous solution which may contain other substances, for example enough salts or 
monosaccharides to make the solution isotonic with blood. For buccal or sublingual 
administration the compositions may be administered in the form of tablets or lozenges 
1 0 which can be formulated in a conventional manner. 

The amount of virus administered is typically m the range of from 10^ to 10^*^ pfu, 
preferably from 10^ to 10^ pfu, more preferably from 10^ to lO'' pfu. When injected, 
typically 1-10 |li1 of virus in a pharmaceutically acceptable suitable carrier or diluent is 
15 administered. 

When the polynucleotide/vector is administered as a naked nucleic acid, the amoxmt of 
nucleic acid administered is typically in the range of from 1 jig to 10 mg, preferably from 
100 |Lig to 1 mg. 

20 

Where the first nucleotide sequence (or other therapeutic sequence) is under the control of 
an inducible regulatory sequence, it may only be necessary to induce gene expression for 
the duration of the treatment. Once the condition has been treated, the inducer is removed 
and expression of the NOI is stopped. This will clearly have clinical advantages. Such a 
25 system may, for example, involve administering the antibiotic tetracycline, to activate gene 
expression via its effect on the tet repressorAnP16 fusion protein. 

The invention will now be further described by way of Examples, which are meant to serve 
to assist one of ordinary skill in the art in carrying out the invention and are not intended in 
30 any way to limit the scope of the invention. The Examples refer to the Figures. In the 
Figures: 



wo 00/55341 PCT/GBOO/01002 

-21- 

Figure 1 shows schematically ribozymes inserted into four different fflV vectors; 

Figure 2 shows schematically how to create a suitable 3' LTR by PGR; 

5 Figure 3 shows the codon usage table for wild type HIV gag,pol of strain HXB2 (accession 
number: K03455). 

Figure 4 shows the codon usage table of the codon optimised sequence designated gag,pol- 
SYNgp. 

10 

Figure 5 shows the codon usage table of the wild type HTV env called env-mn. 

Figure 6 shows the codon usage table of the codon optimised sequence of HIV env 
designated SYNgpl60mn. 

15 

Figure 7 shows three plasmid constructs for use in the invention. 

Figure 8 shows the principle behind two systems for producing retroviral vector particles. 
20 Figure 9 A shows an EGS based on tyrosyl t-RNA 
Figure 9B shows a consensus EGS sequence. 
Figure 10 shows twelve different anti-HIV EGS constructs. 

25 

Figure 1 1 is a schematic representation of pDozenEgs and construction of pH4DozenEgs. 

The invention will now be further described in the Examples which follow, which are 
intended as an illustration only and do not limit the scope of the invention. 

30 
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Reference Example 1 - Construction of a Ribozyme-encoding Genome 

5 The HIV gag.pol sequence was codon optimised (Figure 4 and SEQ I.D. No. 1) and 
synthesised using overlapping oligos of around 40 nucleotides. This has three advantages. 
Firstly it allows an HIV based vector to carry ribozymes and other therapeutic factors. 
Secondly the codon optimisation generates a higher vector titre due to a higher level of 
gene expression. Thirdly gag.pol expression becomes rev independent which allows the 
10 use of anti-rev or RRE factors. 

Conserved sequences v^dthin gagpol were identified by reference to the HTV Sequence 
database at Los Alamos National Laboratory (http:// hiv-web.lanl.gov/) and used to design 
ribozymes. Because of the variability between subtypes of HTV-l the ribozymes were 
15 designed to cleave the predominant subtype within North America, Latin America and the 
Caribbean, Europe, Japan and Australia; that is subtype B. The sites chosen were cross- 
referenced with the synthetic gagpol sequence to ensure that there was a low possibility of 
cutting the codon optimised gagpol mRNA. The ribozymes were designed vnihXhol and 

20 Sail sites at the 5' and 3' end respectively. This allows the construction of separate and 
tandem ribozymes. 

The ribozymes are hammerhead (Riddell et al., 1996) structures of the following general 
structure: 

25 

Helix I HeUx H Helix HI 

5 ' - NNNNNNNN-^ CUGAUGAGGCCGAAAGGCCGAA --NNNNNNNN^ 

The catalytic domain of the ribozyme (Helix II) can tolerate some changes Mdthout 
30 reducing catalytic turnover. 

The cleavage sites, targeting gag and pol, with the essential GUX triplet (where X is any 
nucleotide base) are as follows: 
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GAG 


1 


5 • 


UAGUAAGAAUGUAUAGCCCUAC 


GAG 


2 


5 ' 


AACCCAGAUUGUAAGACUAUDU 


GAG 


3 


5 • 


UGUUUCAAUUGUGGCAAAGAAG 


GAG 


4 


5 ' 


AAAAAGGGCUGUUGGAAAUGUG 


POL 


1 


5 • 


ACGACCCCUCGUCACAAUAAAG 


POL 


2 


5 ' 


GGAAUUGGAGGUUUUAUCAAAG 


POL 


3 


5 ' 


AUAUUDUUCAGUUCCCUUAGAU 


POL 


4 


5 • 


UGGAUGAUUUGUAUGUAGGAUC 


POL 


5 


5 ' 


CUUUGGAUGGGUUAUGAACUCC 


POL 


6 


5 ' 


CAGCUGGACUGUCAAUGACAUA 


POL 


7 


5 ' 


AACUUUCUAUGUAGAUGGGGCA 


POL 


8 


5 • 


AAGGCCGCCUGUUGGUGGGCAG 


POL 


9 


5 • 


UAAGACAGCAGUACAAAUGGCA 



15 

The ribozymes are inserted into four different HIV vectors (pH4 (Gervaix et al, 1997), 
pH6, pH4.1, or pH6.1) (Figure 1). In pH4 and pH6, transcription of the ribozymes is 
driven by an internal HCMV promoter (Foecking et al, 1986). From pH4.1 and pH6.1, the 
ribozymes are expressed from the 5* LTR. The major difference between pH4 and pH6 
20 (and pH4.1 and pH6.1) resides in the 3* LTR in the production plasmid. pH4 and pH4.1 
have the HIV U3 in the y LTR. pH6 and pH6.1 have HCMV in the 3XTR. The HCMV 
promoter replaces most of the U3 and will drive expression at high constitutive levels 
while the HTV-l U3 will support a high level of expression only in the presence of Tat. 

25 The HCMV/HIV-1 hybrid 3* LTR is created by recombinant PCR with three PCR primers 
(Figure 2). The first round of PCR is performed with RIBl and RIB2 using pH4 (Kim et 
al, 1998) as the template to amplify the fflV-1 HXB2 sequence 8900-9123. The second 
round of PCR makes the junction between the 5' end of the HTV-l U3 and the HCMV 
promoter by amplifying the hybrid 5* LTR from pH4. The PCR product from the first PCR 

30 reaction and RIB3 serves as the 5* primer and 3* primer respectively. 



RIBl 
RIB2 
RIBS 



5' -CAGCTGCTCGAGCAGCTGAAGCTTGCATGC-3' 

5 ' -GTAAGTTATGTAACGGACGATATCTTGTCTTCTT-3 ' 

5' -CGCATAGTCGACGGGCCCGCCACTGCTAGAGATTTTC-3' 
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The PGR product is then cut with Sphl and Sail and inserted into pH4 thereby replacing the 
3' LTR. The resulting plasmid is designated pH6. To construct pH4.1 and pH6.1, the 
internal HCMV promoter (Spel - Xhol) in pH4 and pH6 is replaced with the polycloning 
5 site of pBluescript 11 KS+ (Stratagene) (Spel - Xhol). 

The ribozymes are inserted into the Xhol sites in the genome vector backbones. Any 
ribozymes in any configuration could be used in a similar way. 

10 Reference Example 2 - Construction of a Packaging System 

The packaging system can take various forms. In a first form of packaging system, the HIV 
gag, pol components are co-expressed with the HTV env coding sequence. In this case, 
both the gag, pol and the env coding sequences are altered such that they are resistant to the 

15 anti-HTV ribozymes that are built into the genome. At the same time as altering the codon 
usage to achieve resistance, the codons can be chosen to match the usage pattern of the 
most highly expressed mammalian genes. This dramatically increases expression levels 
and so increases titre. A codon optimised HIV env coding sequence has been described by 
Haas et al (1996). In the present example, a modified codon optimised HIV env sequence 

20 is used (SEQ LD, No. 3). The corresponding env expression plasmid is designated 
pSYNgpl60mn. The modified sequence contains extra motifs not used by Haas et al. The 
extra sequences were taken firom the HIV env sequence of strain MN and codon optimised. 
Any similar modification of the nucleic acid sequence would fimction similarly as long as 
it used codons corresponding to abundant tRNAs (Zolotukhin et al, 1996) and lead to 

25 resistance to the ribozymes in the genome. 

In one example of a gag, pol coding sequence with optimised codon usage, overlapping 
oligonucleotides are synthesised and then ligated together to produce the synthetic coding 
sequence. The sequence of a wild-type (Genbank accession no. K03455) and synthetic 
30 (gagpol-SYNgp) gagpol sequence is shown in SEQ LD. Nos 1 and 2, respectively and their 
codon usage is shown in Figures 3 and 4, respectively. The sequence of a wild type env 
coding sequence (Genbank Accession No. Ml 7449) is given in SEQ LD. No 3, the 
sequence of a synthetic codon optimised sequence is given in SEQ. LD. No. 4 and their 
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codon us^e tables are given in Figures 5 and 6, respectively. As with the env coding 
sequence any gag, pol sequence that achieves resistance to the ribozymes could be used. 
The synthetic sequence shown is designated gag, pol-SYNgp and has an EcoRl site at the 5' 
end and a Notl site at the 3* end. It is inserted into pClneo (Promega) to produce plasmid 
5 pSYNgp. 

The sequence of the codon optimised gagpol sequence is shown in SEQ I.D, No. 2. This 
sequence starts at the ATG and ends at the stop codon of gagpol. The wild type sequence 
is retained around the frameshift site so that the right amount of gagpol is made. 

10 

In addition other constructs can be used that contain the optimised gagpol of pS YNgp but 
also have differing amounts of the wild type HIV 1 sequence of strain HXB2 (accession 
number: K03455) at the 5' end. These constructs are described below (the start ATG of 
pS YNgp is shown in bold in these sequences). 

15 

pSYNgp2 contains the entire leader sequence of HIV- 1 (SEQ ED. No. 12). 

pSYNgpB contains the leader sequence of HIV-1 from the major splice donor (SEQ ID. 

No. 13). 

pSYNgp4 contains 20pb of the leader sequence of HTV-l upstream of the start codon of 
20 ATG (SEQ ID. No. 14). 

These constmcts may be made by overlapping PGR. Using appropriate restriction enzymes 
these sequences can be inserted into manomalian expression vectors such as pCI-Neo 
(Promega). All these gag/pol constructs can be used to supply HTV gag/pol for the 
25 generation of viral vectors. These viral vectors can be used to express either EGS 
molecules or ribo2yme molecules or antisense molecules or any peptides or proteins. 

In a second form of the packaging system a synthetic gag, pol cassette is coexpressed with 
a non-HTV envelope coding sequence that produces a surface protein that pseudotypes 
30 HIV. This could be for example VSV-G (Ory et al, 1996; Zhu et al., 1990), amphotropic 
MLV env (Chesebro et aL, 1990; Spector et aL, 1990) or any other protein that would be 
incorporated into the HTV particle (Valsesia-Wittman, 1994). This includes molecules 
capable of targeting the vector to specific tissues. Coding sequences for non-HIV envelope 
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proteins not cleaved by the ribozymes and so no sequence modification is required 
(although some sequence modification may be desirable for other reasons such as 
optimisation for codon usage in mammalian cells). 

5 Reference Example 3 - Vector Particle Production 

Vector particles can be produced either from a transient three-plasmid transfection system 
sunilar to that described by Soneoka et al (1995) or from producer cell lines similar to 
those used for other retrovfrai vectors (Ory et al,, 1996; Srinivasakumar et al, 1991 \ Yu et 

10 a/., 1996). These principles are illustrated in Figures 7 and 8. For example, by using 
pH6Rz, pSYNgp and pRV67 (VSV-G expression plasmid) in a three plasmid transfection 
of 293T cells (Figure 8), as described by Soneoka et al (1995), vector particles designated 
H6RZ-VSV are produced. These transduce the H6Rz genome to CD4+ cells such as 
CI 866 or Jurkat and produce the multitarget ribozymes. HIV replication in these cells is 

1 5 now severely restricted. 

Example 1 - Use of external guide sequences for inhibiting HTV 

Ribonuclease P is a nuclear localised enzyme consisting of protein and RNA subunits. It 
20 has been found in all organisms examined and is one of the most abimdant, stable and 
efficient enzymes in cells. Its enzymatic activity is responsible for the maturation of the 5* 
termini of all tRNAs which account for about 2% of the total cellular RNA. 

For tRNA processing, it has been shown that RNAse P recognises a secondary structure of 
25 the tRNA. However extensive studies have shown that any complex of two RNA 
molecules which resemble the one tRNA molecule will also be recognised and cleaved by 
RNase P. Consequently the natural activity of RNase P can and has been successfully re- 
directed to target other RNA species (see Yaun and Altman, 1994, and references therein). 
This is achieved by engineering a sequence, containing the flanking motif recognised by 
30 RNaseP, to bind the desired target sequence. These sequences are called external guide 
sequence (EGSs). 
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Outlined here is a strategy employing the EGS system against HIV RNA. Shown in Figure 
2 A, B and C are twelve EGS sequences designed to target twelve separate HIV gag/pol 
sequences. These target sequences are conserved throughout the clade B of HTV. The 
sequence numbering in each figure designates the position of the required conserved G of 
each target sequences based on the HXB2 published sequence. 

The extemal guide sequences shown here all have anticodon stem-loops deleted. These are 
non-limiting examples; for instance full length 3/4 tRNA based EGSs might be used if 
preferred (see Yuan and Altman, 1994). 

Outlined in SEQ ID. Nos. 5 to 10 (see below) and Figure 11 is the cloning strategy 
employed to construct an HIV vector containing the EGSs described in SEQ ED. Nos. 5 to 
10, The oligonucleotides prefixed 1, 2, 3, 4, 5 and 6 are respectively annealed together and 
sequentially cloned into the pSP72 (Promega) cloning vector starting with the oligo. duplex 
1/1 A being cloned into the Xhol-SaR site such that the EGS 4762 and EGS 4715 are 
orientated away fi-om the ampicillin gene. The remaining oligonucleotides (with ATzoI ends) 
are subsequently cloned stepwise (starting with oligo. duplex 2/2A, ending with duplex 
6/6 A) into the unique SaR site (present within the terminus of the each preceding 
oUgonucleotide) to create the plasmid pDOZENEGS. The EGSs from this vector are then 
transferred by Xhol-Sphl digest into the pH4Z similarily cut such that the multiple EGSs 
cassette replaces the lacZ gene of pH4Z (Rim et al, 1998). The resulting vector is named 
pH4D0ZENEGS (see SEQ ID. No. 1 1 for complete sequence). 

Egs 1/1 A (SEQ ID. No. 5) 

Xhol 

5 ' - tcgagcccggggatgacgtcatcgacttcgaaggttcgaatccttct:actgccaccat:tttt:t 
cgggcccctactgcagt:agctgaagcttccaagctt:aggaagat:gacggtggt:aaaaaa 

ctctacgtcatcgacttcgaaggttcgaatcctt:ccctgtccaccagtcgacc-3' 
gagatgcagtagctgaagcttccaagcttaggaagggacaggtggtcagctggagct-S' 

Egs 2/2A (SEQ ED. No. 6) 

5 ' - tcgagtattacgtcatcgactitcgaaggtitcgaatccttctagattcaccattttttiaggaacg 
cataatgcagtagctgaagcttccaagcttaggaagtactaagtggtaaaaaatccttgc 
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tcatcgacttcgaaggttcgaatccttccagttccaccagtcgacc-3' 
agtagctgaagcttccaagcttaggaaggtcaaggtggtcagctggagct-5' 

Egs 3/3 A (SEQ ID. No. 7) 

5' - tcgaggccaacgtcatcgacttcgaaggttcgaatccttctcttcccaccattttttttcc 
ccggttgcagtagctgaagcttccaagcttaggaagagaagggtggtaaaaaaaagg 

acgtcatcgacttcgaaggttcgaatccttcggggcccaccagtcgacc-S' 
tgcagtagctgaagcttccaagcttaggaagccccgggtggtcagctggagct-S ' 

Egs 4/4A (SEQ ID. No. 8) 

5' - tcgagggctacgtcatcgacttcgaaggttcgaatccttcttgcttcaccatttttt 
cccgatgcagtagctgaagcttccaagcttaggaagaacgaagtggtaaaaaa 

ctgaacgtcatcgacttcgaaggttcgaatccttctgctgtcaccagtcgacc-3' 
gacttgcagtagctgaagcttccaagcttaggaagacgacagtggtcagctggagct-5' 

Egs5/5A (SEQ ID. No. 9) 

5 ' - tcgagtataacgtcatcgacttcgaaggttcgaatccttcaccggtcaccatttttttata 
catattgcagtagctgaagcttccaagcttaggaagtggccagtggtaaaaaaatat 

acgtcatcgacttcgaaggttcgaatccttcttcttacaccagtcgacc-S' 
tgcagtagctgaagcttccaagcttaggaagaagaatgtggtcagctggagct-5' 

Egs 6/6A (SEQ ID. No. 10) 

5 ' - tcgaggtacacgtcatcgacttcgaaggttcgaatccttcgtagttcaccattttttgtgc 
ccatgtgcagtagctgaagcttccaagcttaggaagcatcaagtggtaaaaaacacg 

SphI 

acgtcatcgacttcgaaggttcgaatcctt:ctaggcccaccagtcgacgcatgcc-3' 
tgcagtagctgaagcttccaagcttaggaagatccgggtggtcagctgcgtacggagct-S' 

The pH4DOZENEGS_vector may be used to both deliver and express the example EGS 
sequences to appropriate eukaryotic cells in a manner as described for ribozymes in 
reference examples 1, 2 and 3 whereby the use of a codon optimised gag/pol and env genes 
would prevent EGSs from targeting these genes during viral production. The inclusion of 
the EGS sequences into an HTV derived vector will not only allow expression of such 
sequences in the target cell but also packaging and transfer of such therapeutic sequences 
by the patient's own HTV. These example EGS sequences target HIV RNA for cleavage by 
RNAse P. This example is not limiting and other suitable EGS and derived sequences may 
also be used; be they expressed singularly, in multiples, from pol I, pol 11 or pol HI 
promoters and derivatives thereof and/or in combination with other HTV treatments. Other 
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appropriate nucleotide sequences of interest (NOIs) may also be included in combination 
with EGSs if preferred. 

All publications mentioned in the above specification are herein incorporated by reference. 
5 Various modifications and variations of the described methods and system of the invention 
will be apparent to those skilled in the art without departing fi-om the scope and spirit of the 
invention. Although the invention has been described in connection with specific preferred 
embodiments, it should be understood that the invention as claimed should not be undvdy 
limited to such specific embodiments. Indeed, various modifications of the described 
10 modes for canying out the invention which are obvious to those skilled in molecular 
biology or related fields are intended to be within the scope of the following claims. 
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CLAIMS 

1 . A viral vector system comprising: 

(i) a first nucleotide sequence encoding an external guide sequence capable of binding 
to and effecting the cleavage by RNase P of a second nucleotide sequence, or transcription 
product thereof, encoding a viral polypeptide required for the assembly of viral particles; 
and 

(ii) a third nucleotide sequence encoding said viral polypeptide required for the 
assembly of viral particles, which third nucleotide sequence has a different nucleotide 
sequence to the second nucleotide sequence such that the third nucleotide sequence, or 
transcription product thereof, is resistant to cleavage directed by the external guide 
sequence. 

2. A system according to claim 1 further comprising at least one further first 
nucleotide sequence encoding a gene product capable of binding to and effecting the 
cleavage, directly or indirectly, of a second nucleotide sequence, or transcription product 
thereof, encodmg a viral polypeptide required for the assembly of viral particles, wherein 
the gene product is selected from an external guide sequence, a ribozyme and an anti-sense 
ribonucleic acid. 

3. A viral vector production system comprising: 

(i) a viral genome comprising at least one first nucleotide sequence encoding a gene 
product capable of binding to and effecting the cleavage, directly or indirectly, of a second 
nucleotide sequence, or transcription product thereof, encoding a viral polypeptide required 
for the assembly of viral particles; 

(ii) a third nucleotide sequence encoding said viral polypeptide required for the 
assembly of the viral genome into viral particles, which third nucleotide sequence has a 
different nucleotide sequence to the second nucleotide sequence such that said third 
nucleotide sequence, or transcription product thereof, is resistant to cleavage directed by 
said gene product; 

wherein at least one of the gene products is an external guide sequence capable of binding 
to and effecting the cleavage by RNase P of the second nucleotide sequence. 
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4. A system according to claim 3 wherein in addition to an external guide sequence, at 
least one gene product is selected from a ribozyme and an anti-sense ribonucleic acid. 

5. A system according to any one of claims 1 to 4 wherein the viral vector is a 
retroviral vector. 

6. A system according to claim 5 wherein the retroviral vector is a ienti viral vector. 

7. A system according to claim 6 wherein the lentiviral vector is an HTV vector. 

8. A system according to any one of claims 5 to 7 wherein the polypeptide required for 
the assembly of viral particles is selected from gag, pol and env proteins. 

9. A system according to claim 8 wherein at least the gag and pol proteins are from a 
lentivirus. 

10. A system according to claim 7 wherein the env protein is from a lentivirus. 

11. A system according to claim 9 or 10 wherein the lentivirus is HIV, 

12. A system according to any one of the preceding claims wherein the third nucleotide 
sequence is resistant to cleavage directed by the gene product as a result of one or more 
conservative alterations in the nucleotide sequence which remove cleavage sites recognised 
by the at least one gene product and/or binding sites for the at least one gene product 

13. A system according to any one of claims 1 to 11 wherein the third nucleotide 
sequence is adapted to be resistant to cleavage by the at least one gene product. 

14. A system according to any one of the preceding claims wherein the third nucleotide 
sequence is codon optimised for expression in producer cells. 



15. 



A system according to claim 14, wherein the producer cells are mamm£ilian cells. 
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16. A system according to any one of the preceding claims comprising a plurality of 
first nucleotide sequences and third nucleotide sequences as defined therein. 

17. A viral particle comprising a viral vector genome as defined in any one of claims 3 
to 16 and one or more third nucleotide sequences as defined in any of claims 3 to 16. 

18. A viral particle produced using a viral vector production system according to any 
one of claims 3 to 16. 

19. A method for producing a viral particle which method comprises introducing into a 
host cell (i) a viral genome as defined in any one of claims 3 to .16 (ii) one or more third 
nucleotide sequences as defined in any of claims 3 to 16 and (iii) nucleotide sequences 
encoding the other essential viral packaging components not encoded by the one or more 
third nucleotide sequences. 

20. A viral particle produced by the method of claim 1 9. 

21. A pharmaceutical composition comprising a viral particle according to claims 17, 
1 8 or 20 together with a pharmaceutically acceptable carrier or diluent. 

22. A viral system according to any one of claims 1 to 1 7 or a viral particle according to 
claims 17, 1 8 or 20 in treating a viral infection. 

23. A viral system according to any one of claims 1 to 17 for use in a method of 
producing viral particles. 
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figure 2 
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Figure 3 



gagpol*KX32 -> Codoa Usage 

DNA sequence 4308 b.p, ATGGGTGCGAGA ... GATGAGGATTAG linear 



1436 codons 



MW : 161929 Dalton CAI(S.c.) : 0.083 CAI(E.c.> : 0.151 
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Figure 



gagpol-SVNgp [1 to 43 08 J -> Codon Usage 

DNA sequence 43 03 b.p. ATGGGCGCCCGC ... GATGAGGATTAG linear 
143 6 codons 



MW : iei929 Dalton CAI(S.c.) : 0.080 CAI(E.c.) : 0.29S 
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Figure 5 



env-mn (1 to 2S71] -> Codon Usage 

DNA sequence 2571 b.p. ATGAGAGTGAAG ... GCTTTGCTATAA lineai: 



857 codons 



MW : 97078 DalCon CAI(S.c.) 
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: 0.083 CAICE.c.) : 0.140 
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Figure 6 



SYSIgplSOmn -> Codon Usage 
DNA. sequence 2S71 b.p* 

857 codons 



HW : 97078 Dalton CAICS. 
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: 0.074 CAKE.c.) : 0.419 
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Figure 9 A 
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Figure 9 B 

Generic design of EGSs to target any RNA. 
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SEOUENCE LISTING PART OF THE DESCRIPTION 



SEQ. ID. NO. 1 - Wild type gagpol sequence for strain HXB2 (accession no. K03455) 

ATGGGTGCGA GAGCGTCAGT ATTAAGCGGG GGAGAATTAG ATCGATGGGA AAAAATTCGG 6 0 
TTAAGGCCAG GGGGAAAGAA AAAATATAAA TTAAAACATA TAGTATGGGC AAGCAGGGAG 120 
CTAGAACGAT TCGCAGTTAA TCCTGGCCTG TTAGAAACAT CAGAAGGCTG TAGACAAATA 180 
CTGGGACAGC TACAACCATC CCTTCAGACA GGATCAGAAG AACTTAGATC ATTATATAAT 24 0 
ACAGTAGCAA CCCTCTATTG TGTGCATCAA AGGATAGAGA TAAAAGACAC CAAGGAAGCT 3 00 
TTAGACAAGA TAGAGGAAGA GCAAAACAAA AGTAAGAAAA AAGCACAGCA AGCAGCAGCT 360 
GACACAGGAC ACAGCAATCA GGTCAGCCAA AATTACCCTA TAGTGCAGAA CATCCAGGGG 420 
CAAATGGTAC ATCAGGCCAT ATCACCTAGA ACTTTAAATG CATGGGTAAA AGTAGTAGAA 480 
GAGAAGGCTT TCAGCCCAGA AGTGATACCC ATGTTTTCAG CATTATCAGA AGGAGCCACC 54 0 
CCACAAGATT TAAACACCAT GCTAAACACA GTGGGGGGAC ATCAAGCAGC CATGCAAATG 600 
TTAAAAGAGA CCATCAATGA GGAAGCTGCA GAATGGGATA GAGTGCATCC AGTGCATGCA 660 
GGGCCTATTG CACCAGGCCA GATGAGAGAA CCAAGGGGAA GTGACATAGC AGGAACTACT 720 
AGTACCCTTC AGGAACAAAT AGGATGGATG ACAAATAATC CACCTATCCC AGTAGGAGAA 780 
ATTTATAAAA GATGGATAAT CCTGGGATTA AATAAAATAG TAAGAATGTA TAGCCCTACC 840 
AGCATTCTGG ACATAAGACA AGGACCAAAG GAACCCTTTA GAGACTATGT AGACCGGTTC 900 
TATAAAACTC TAAGAGCCGA GCAAGCTTCA CAGGAGGTAA AAAATTGGAT GACAGAAACC 960 
TTGTTGGTCC AAAATGCGAA CCCAGATTGT AAGACTATTT TAAAAGCATT GGGACCAGCG 1020 
GCTACACTAG AAGAAATGAT GACAGCATGT CAGGGAGTAG GAGGACCCGG CCATAAGGCA 1080 
AGAGTTTTGG CTGAAGCAAT GAGCCAAGTA ACAAATTCAG CTACCATAAT GATGCAGAGA 1140 
GGCAATTTTA GGAACCJ^AAG AAAGATTGTT AAGTGTTTCA ATTGTGGCAA AGAAGGGCAC 1200 
ACAGCCAGAA ATTGCAGGGC CCCTAGGAAA AAGGGCTGTT GGAAATGTGG AAAGGAAGGA 1260 
CACCAAATGA AAGATTGTAC TGAGAGACAG GCTAATTTTT TAGGGAAGAT CTGGCCTTCC 1320 
TACAAGGGAA GGCCAGGGAA TTTTCTTCAG AGCAGACCAG AGCCAACAGC CCCACCAGAA 1380 
GAGAGCTTCA GGTCTGGGGT AGAGACAACA ACTCCCCCTC AGAAGCAGGA GCCGATAGAC 1440 
AAGGAACTGT ATCCTTTAAC TTCCCTCAGG TCACTCTTTG GCAACGACCC CTCGTCACAA 1500 
TAAAGATAGG GGGGCAACTA AAGGAAGCTC TATTAGATAC AGGAGCAGAT GATACAGTAT 1560 
TAGAAGAAAT GAGTTTGCCA GGAAGATGGA AACCAAAAAT GATAGGGGGA ATTGGAGGTT 162 0 
TTATCAAAGT AAGACAGTAT GATCAGATAC TCATAGAAAT CTGTGGACAT AAAGCTATAG 1680 
GTACAGTATT AGTAGGACCT ACACCTGTCA ACATAATTGG AAGAAATCTG TTGACTCAGA 174 0 
TTGGTTGCAC TTTAAATTTT CCCATTAGCC CTATTGAGAC TGTACCAGTA AAATTAAAGC 1800 
CAGGAATGGA TGGCCCAAAA GTTAAACAAT GGCCATTGAC AGAAGAAAAA ATAAAAGCAT 1860 
TAGTAGAAAT TTGTACAGAG ATGGAAAAGG AAGGGAAAAT TTCAAAAATT GGGCCTGAAA 1920 
ATCCAT ACAA TACTCCAGTA TTTGCCATAA AGAAAAAAGA CAGTACTAAA TGGAGAAAAT 1980 
TAGTAGATTT CAGAGAACTT AATAAGAGAA CTCAAGACTT CTGGGAAGTT CAATTAGGAA 2040 
TACCACATCC CGCAGGGTTA AAAAAGAAAA AATCAGTAAC AGTACTGGAT GTGGGTGATG 2100 
CATATTTTTC AGTTCCCTTA GATGAAGACT TCAGGAAGTA TACTGCATTT ACCATACCTA 2160 
GTATAAACAA TGAGACACCA GGGATTAGAT ATCAGTACAA TGTGCTTCCA CAGGGATGGA 222 0 
AAGGATCACC AGCAATATTC CAAAGTAGCA TGACAAAAAT CTTAGAGCCT TTTAGAAAAC 2280 
AAAATCCAGA CATAGTTATC TATCAATACA TGGATGATTT GTATGTAGGA TCTGACTTAG 2340 
AAATAGGGCA GCATAGAACA AAAATAGAGG AGCTGAGACA ACATCTGTTG AGGTGGGGAC 24 00 
TTACCACACC AGACAAAAAA CATCAGAAAG AACCTCCATT CCTTTGGATG GGTTATGAAC 2460 
TCCATCCTGA TAAATGGACA GTACAGCCTA TAGTGCTGCC AGAAAAAGAC AGCTGGACTG 2520 
TCAATGACAT ACAGAAGTTA GTGGGGAAAT TGAATTGGGC AAGTCAGATT TACCCAGGGA 258 0 
TTAAAGTAAG GCAATTATGT AAACTCCTTA GAGGAACCAA AGCACTAACA GAAGTAATAC 264 0 
CACTAACAGA AGAAGCAGAG CTAGAACTGG CAGAAAACAG AGAGATTCTA AAAGAACCAG 2700 
TACATGGAGT GTATTATGAC CCATCAAAAG ACTTAATAGC AGAAATACAG AAGCAGGGGC 2760 
AAGGCCAATG GACATATCAA ATTTATCAAG AGCCATTTAA AAATCTGAAA ACAGGAAAAT 2820 
ATGCAAGAAT GAGGGGTGCC CACACTAATG ATGTAAAACA ATTAACAGAG GCAGTGCAAA 2 88 0 
AAATAACCAC AGAAAGCATA GTAATATGGG GAAAGACTCC TAAATTTAAA CTGCCCATAC 294 0 
AAAAGGAAAC ATGGGAAACA TGGTGGACAG AGTATTGGCA AGCCACCTGG ATTCCTGAGT 3000 
GGGAGTTTGT TT^TACCCCT CCCTTAGTGA AATTATGGTA CCAGTTAGAG AAAGAACCCA 3060 
TAGTAGGAGC AGAAACCTTC TATGTAGATG GGGCAGCTAA CAGGGAGACT AAATTAGGAA 3120 
AAGCAGGATA TGTTACTAAT AGAGGAAGAC AAAAAGTTGT CACCCTAACT GACACAACAA 3180 
ATCAGAAGAC TGAGTTACAA GCAATTTATC TAGCTTTGCA GGATTCGGGA TTAGAAGTAA 3240 
ACATAGTAAC AGACTCACAA TATGCATTAG GAATCATTCA AGCACAACCA GATCAAAGTG 3300 
AATCAGAGTT AGTCAATCAA ATAATAGAGC AGTTAATAAA AAAGGAAAAG GTCTATCTGG 3360 
CATGGGTACC AGCACACAAA GGAATTGGAG GAAATGAACA AGTAGATAAA TTAGTCAGTG 3420 
CTGGAATCAG GAAAGTACTA TTTTTAGATG GAATAGATAA GGCCCAAGAT GAACATGAGA 34 8 0 
AATATCACAG TAATTGGAGA GCAATGGCTA GTGATTTTAA CCTGCCACCT GTAGTAGCAA 3540 
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AAGAAATAGT AGCCAGCTGT GATAAATGTC 
TAGACTGTAG TCCAGGAATA TGGCAACTAG 
TGGTAGCAGT TCATGTAGCC AGTGGATATA 
GGCAGGAAAC AGCATATTTT CTTTTAAAAT 
ATACTGACAA TGGCAGCAAT TTCACCGGTG 
GAATCAAGCA GGAATTTGGA ATTCCCTACA 
TGAATAAAGA ATTAAAGAAA ATTATAGGAC 
CAGCAGTACA AATGGCAGTA TTCATCCACA 
ACAGTGCAGG GGAAAGAATA GTAGACATAA 
AAAAACAAAT TACAAAAATT CAAAATTTTC 
TTTGGAAAGG ACCAGCAAAG CTCCTCTGGA 
ATAGTGACAT AAAAGTAGTG CCAAGAAGAA 
AGATGGCAGG TGATGATTGT GTGGCAAGTA 



AGCTAAAAGG AGAAGCCATG CATGGACAAG 36 00 
ATTGTACACA TTTAGAAGGA AAAGTTATCC 3660 
TAGAAGCAGA AGTTATTCCA GCAGAAACAG 3 72 0 
TAGCAGGAAG ATGGCCAGTA AAAACAATAC 3780 
CTACGGTTAG GGCCGCCTGT TGGTGGGCGG 3 84 0 
ATCCCCAAAG TCAAGGAGTA GTAGAATCTA 3 900 
AGGTAAGAGA TCAGGCTGAA CATCTTAAGA 3 960 
ATTTTAAAAG AAAAGGGGGG ATTGGGGGGT 4 02 0 
TAGCAACAGA CATACAAACT AAAGAATTAC 4 08 0 
GGGTTTATTA CAGGGACAGC AGAAATTCAC 4140 
AAGGTGAAGG GGCAGTAGTA ATACAAGATA 4200 
AAGCAAAGAT CATTAGGGAT TATGGAAAAC 4260 
GACAGGATGA GGATTAG 4307 



SEQ LD. NO, 2 - gagpol-SYNgp - codon optimised gagpol sequence 

ATGGGCGCCC GCGCCAGCGT GCTGTCGGGC GGCGAGCTGG ACCGCTGGGA GAAGATCCGC 60 
CTGCGCCCCG GCGGCAAAAA GAAGTACAAG CTGAAGCACA TCGTGTGGGC CAGCCGCGAA 12 0 
CTGGAGCGCT TCGCCGTGAA CCCCGGGCTC CTGGAGACCA GCGAGGGGTG CCGCCAGATC 180 
CTCGGCCAAC TGCAGCCCAG CCTGCAAACC GGCAGCGAGG AGCTGCGCAG CCTGTACAAC 240 
ACCGTGGCCA CGCTGTACTG CGTCCACCAG CGCATCGAAA TCAAGGATAC GATIAGAGGCC 3 00 
CTGGATAAAA TCGAAGAGGA ACAGAATAAG AGCAAAAAGA AGGCCCAACA GGCCGCCGCG 3 60 
GACACCGGAC ACAGCAACCA GGTCAGCCAG AACTACCCCA TCGTGCAGAA CATCCAGGGG 42 0 
CAGATGGTGC ACCAGGCCAT CTCCCCCCGC ACGCTGAACG CCTGGGTGAA GGTGGTGGAA 48 0 
GAGAAGGCTT TTAGCCCGGA GGTGATACCC ATGTTCTCAG CCCTGTCAGA GGGAGCCACC 540 
CCCCAAGATC TGAACACCAT GCTCAACACA GTGGGGGGAC ACCAGGCCGC CATGCAGATG 600 
CTGAAGGAGA CCATCAATGA GGAGGCTGCC GAATGGGATC GTGTGCATCC GGTGCACGCA 660 
GGGCCCATCG CACCGGGCCA GATGCGTGAG CCACGGGGCT CAGACATCGC CGGAACGACT 720 
AGTACCCTTC AGGAACAGAT CGGCTGGATG ACCAACAACC CACCCATCCC GGTGGGAGAA 780 
ATCTACAAAC GCTGGATCAT CCTGGGCCTG AACAAGATCG TGCGCATGTA TAGCCCTACC 840 
AGCATCCTGG ACATCCGCCA AGGCCCGAAG GAACCCTTTC GCGACTACGT GGACCGGTTC 900 
TACAAAACGC TCCGCGCCGA GCAGGCTAGC CAGGAGGTGA AGAACTGGAT GACCGAAACC 960 
CTGCTGGTCC AGAACGCGAA CCCGGACTGC AAGACGATCC TGAAGGCCCT GGGCCCAGCG 1020 
GCTACCCTAG AGGAAATGAT GACCGCCTGT CAGGGAGTGG GCGGACCCGG CCACAAGGCA 108 0 
CGCGTCCTGG CTGAGGCCAT GAGCCAGGTG ACCAACTCCG CTACCATCAT GATGCAGCGC 114 0 
GGCAACTTTC GGAACCAACG CAAGATCGTC AAGTGCTTCA ACTGTGGCAA AGAAGGGCAC 120 0 
ACAGCCCGCA ACTGCAGGGC CCCTAGGAAA AAGGGCTGCT GGAAATGCGG CAAGGAAGGC 126 0 
CACCAGATGA AAGACTGTAC TGAGAGACAG GCTAATTTTT TAGGGAAGAT CTGGCCTTCC 1320 
TACAAGGGAA GGCCAGGGAA TTTTCTTCAG AGCAGACCAG AGCCAACAGC CCCACCAGAA 13 8 0 
GAGAGCTTCA GGTCTGGGGT AGAGACAACA ACTCCCCCTC AGAAGCAGGA GCCGATAGAC 1440 
AAGGAACTGT ATCCTTTAAC TTCCCTCAGA TCACTCTTTG GCAACGACCC CTCGTCACAA 1500 
TAAAGATAGG GGGGCAGCTC AAGGAGGCTC TCCTGGACAC CGGAGCAGAC GACACCGTGC 1560 
TGGAGGAGAT GTCGTTGCCA GGCCGCTGGA AGCCGAAGAT GATCGGGGGA ATCGGCGGTT 162 0 
TCATCAAGGT GCGCCAGTAT GACCAGATCC TCATCGAAAT CTGCGGCCAC AAGGCTATCG 1680 
GTACCGTGCT GGTGGGCCCC ACACCCGTCA ACATCATCGG ACGCAACCTG TTGACGCAGA 174 0 
TCGGTTGCAC GCTGAACTTC CCCATTAGCC CTATCGAGAC GGTACCGGTG AAGCTGAAGC 18 0 0 
CCGGGATGGA CGGCCCGAAG GTCAAGCAAT GGCCATTGAC AGAGGAGAAG ATCAAGGCAC 186 0 
TGGTGGAGAT TTGCACAGAG ATGGAAAAGG AAGGGAAAAT CTCCAAGATT GGGCCTGAGA 192 0 
ACCCGTACAA CACGCCGGTG TTCGCAATCA AGAAGAAGGA CTCGACGAAA TGGCGCAAGC 198 0 
TGGTGGACTT CCGCGAGCTG AACAAGCGCA CGCAAGACTT CTGGGAGGTT CAGCTGGGCA 2040 
TCCCGCACCC CGCAGGGCTG AAGAAGAAGA AATCCGTGAC CGTACTGGAT GTGGGTGATG 2100 
CCTACTTCTC CGTTCCCCTG GACGAAGACT TCAGGAAGTA CACTGCCTTC ACAATCCCTT 216 0 
CGATCAACAA CGAGACACCG GGGATTCGAT ATCAGTACAA CGTGCTGCCC CAGGGCTGGA 222 0 
AAGGCTCTCC CGCAATCTTC CAGAGTAGCA TGACCAAAAT CCTGGAGCCT TTCCGCAAAC 22 8 0 
AGAACCCCGA CATCGTCATC TATCAGTACA TGGATGACTT GTACGTGGGC TCTGATCTAG 2340 
AGATAGGGCA GCACCGCACC AAGATCGAGG AGCTGCGCCA GCACCTGTTG AGGTGGGGAC 240 0 
TGACCACACC CGACAAGAAG CACCAGAAGG AGCCTCCCTT CCTCTGGATG GGTTACGAGC 246 0 
TGCACCCTGA CAAATGGACC GTGCAGCCTA TCGTGCTGCC AGAGAAAGAC AGCTGGACTG 2 52 0 
TCAACGACAT ACAGAAGCTG GTGGGGAAGT TGAACTGGGC CAGTCAGATT TACCCAGGGA 2580 
TTAAGGTGAG GCAGCTGTGC AAACTCCTCC GCGGAACCAA GGCACTCACA GAGGTGATCC 2640 
CCCTAACCGA GGAGGCCGAG CTCGAACTGG CAGAAAACCG AGAGATCCTA AAGGAGCCCG 2700 
TGCACGGCGT GTACTATGAC CCCTCCAAGG ACCTGATCGC CGAGATCCAG AAGCAGGGGC 2 760 
AAGGCCAGTG GACCTATCAG ATTTACCAGG AGCCCTTCAA GAACCTGAAG ACCGGCAAGT 2 820 
ACGCCCGGAT GAGGGGTGCC CACACTAACG ACGTCAAGCA GCTGACCGAG GCCGTGCAGA 2 880 
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AGATCACCAC CGAAAGCATC GTGATCTGGG 
AGAAGGAAAC CTGGGAAACC TGGTGGACAG 
GGGAGTTCGT CAACACCCCT CCCCTGGTGA 
TAGTGGGCGC CGAAACCTTC TACGTGGATG 
AAGCCGGATA CGTCACTAAC CGGGGCAGAC 
ACCAGAAGAC TGAGCTGCAG GCCATTTACC 
ACATCGTGAC AGACTCTCAG TATGCCCTGG 
AGTCCGAGCT GGTCAATCAG ATCATCGAGC 
CCTGGGTACC CGCCCACAAA GGCATTGGCG 
CTGGCATCAG GAAGGTGCTA TTCCTGGATG 
AATACCACAG CAACTGGCGG GCCATGGCTA 
AAGAGATCGT GGCCAGCTGT GACAAGTGTC 
TGGACTGTAG CCCCGGCATC TGGCAACTCG 
TGGTAGCCGT CCATGTGGCC AGTGGCTACA 
GGCAGGAGAC AGCCTACTTC CTCCTGAAGC 
ATACTGACAA TGGCAGCAAT TTCACCAGTG 
GAATCAAGCA GGAGTTCGGG ATCCCCTACA 
TGAATAAGGA GTTAAAGAAG ATTATCGGCC 
CCGCGGTCCA AATGGCGGTA TTCATCCACA 
ACAGTGCGGG GGAGCGGATC GTGGACATCA 
AAAAGCAGAT TACCAAGATT CAGAATTTCC 
TCTGGAAAGG CCCAGCGAAG CTCCTCTGGA 
ATAGCGACAT CAAGGTGGTG CCCAGAAGAA 
AGATGGCGGG TGATGATTGC GTGGCGAGCA 



-3- 

GAAAGACTCC TAAGTTCAAG CTGCCCATCC 2 940 
AGTATTGGCA GGCCACCTGG ATTCCTGAGT 3000 
AGCTGTGGTA CCAGCTGGAG AAGGAGCCCA 3060 
GGGCCGCTAA CAGGGAGACT AAGCTGGGCA 3120 
AGAAGGTTGT CACCCTCACT GACACCACCA 3180 
TCGCTTTGCA GGACTCGGGC CTGGAGGTGA 324 0 
GCATCATTCA AGCCCAGCCA GACCAGAGTG 3300 
AGCTGATCAA GAAGGAAAAG GTCTATCTGG 3360 
GCAATGAGCA GGTCGACAAG CTGGTCTCGG 3420 
GCATCGACAA GGCCCAGGAC GAGCACGAGA 3480 
GCGACTTCAA CCTGCCCCCT GTGGTGGCCA 354 0 
AGCTCAAGGG CGAAGCCATG CATGGCCAGG 3600 
ATTGCACCCA TCTGGAGGGC AAGGTTATCC 3660 
TCGAGGCCGA GGTCATTCCC GCCGAAACAG 3720 
TGGCAGGCCG GTGGCCAGTG AAGACCATCC 3780 
CTACGGTTAA GGCCGCCTGC TGGTGGGCGG 3840 
ATCCCCAQAG TCAGGGCGTC GTCGAGTCTA 3 900 
AGGTCAGAGA TCAGGCTGAG CATCTCAAGA 3960 
ATTTCAAGCG GAAGGGGGGG ATTGGGGGGT 4020 
TCGCGACCGA CATCCAGACT AAGGAGCTGC 4080 
GGGTCTACTA CAGGGACAGC AGAAATCCCC 4140 
AGGGTGAGGG GGCAGTAGTG ATCCAGGATA 4200 
AGGCGAAGAT CATTAGGGAT TATGGCAAAC 4260 
GACAGGATGA GGATTAG 4307 



SEQ. ID. NO. 3 - Envelope Gene from HIV-1 MN (Genbank accession no. Ml 7449) 

ATGAGAGTGA AGGGGATCAG GAGGAATTAT CAGCACTGGT GGGGATGGGG CACGATGCTC 60 
CTTGGGTTAT TAATGATCTG TAGTGGTACA GAAAAATTGT GGGTCACAGT CTATTATGGG 120 
GTACCTGTGT GGAAAGAAGC AACCACCACT CTATTTTGTG CATCAGATGC TAAAGCATAT 180 
GATACAGAGG TACATAATGT TTGGGCCACA CAAGCCTGTG TACCCACAGA CCCCAACCCA 240 
CAAGAAGTAG AATTGGTAAA TGTGACAGAA AATTTTAACA TGTGGAAAAA TAACATGGTA 3 00 
GAACAGATGC ATGAGGATAT AATCAGTTTA TGGGATCAAA GCCTAAAGCC ATGTGTAAAA 360 
TTAACCCCAC TCTGTGTTAC TTTAAATTGC ACTGATTTGA GGAATACTAC TAATACCAAT 420 
AATAGTACTG CTAATAACAA TAGTAATAGC GAGGGAACAA TAAAGGGAGG AGAAATGAAA 480 
AACTGCTCTT TCAATATCAC CACAAGCATA AGAGATAAGA TGCAGAAAGA ATATGCACTT 54 0 
CTTTATAAAC TTGATATAGT ATCAATAGAT AATGATAGTA CCAGCTATAG GTTGATAAGT 600 
TGTAATACCT CAGTCATTAC ACAAGCTTGT CCAAAGATAT CCTTTGAGCC AATTCCCATA 660 
CACTATTGTG CCCCGGCTGG TTTTGCGATT CTAAAATGTA ACGATAAAAA GTTCAGTGGA 72 0 
AAAGGATCAT GTAAAAATGT CAGCACAGTA CAATGTACAC ATGGAATTAG GCCAGTAGTA 780 
TCAACTCAAC TGCTGTTAAA TGGCAGTCTA GCAGAAGAAG AGGTAGTAAT TAGATCTGAG 840 
AATTTCACTG ATAATGCTAA AACCATCATA GTACATCTGA ATGAATCTGT ACAAATTAAT 900 
TGTACAAGAC CCAACTACAA TAAAAGAAAA AGGATACATA TAGGACCAGG GAGAGCATTT 960 
TATACAACAA AAAATATAAT AGGAACTATA AGACAAGCAC ATTGTAACAT TAGTAGAGCA 1020 
AAATGGAATG ACACTTTAAG ACAGATAGTT AGCAAATTAA AAGAACAATT TAAGAATAAA 1080 
ACAATAGTCT TTAATCAATC CTCAGGAGGG GACCCAGAAA TTGTAATGCA CAGTTTTAAT 1140 
TGTGGAGGGG AATTTTTCTA CTGTAATACA TCACCACTGT TTAATAGTAC TTGGAATGGT 1200 
AATAATACTT GGAATAATAC TACAGGGTCA AATAACAATA TCACACTTCA ATGCAAAATA 1260 
AAACAAATTA TAAACATGTG GCAGGAAGTA GGAAAAGCAA TGTATGCCCC TCCCATTGAA 1320 
GGACAAATTA GATGTTCATC AAATATTACA GGGCTACTAT TAACAAGAGA TGGTGGTAAG 1380 
GACACGGACA CGAACGACAC CGAGATCTTC AGACCTGGAG GAGGAGATAT GAGGGACAAT 1440 
TGGAGAAGTG AATTATATAA ATATAAAGTA GTAACAATTG AACCATTAGG AGTAGCACCC 1500 
ACCAAGGCAA AGAGAAGAGT GGTGCAGAGA GAAAAAAGAG CAGCGATAGG AGCTCTGTTC 1560 
CTTGGGTTCT TAGGAGCAGC AGGAAGCACT ATGGGCGCAG CGTCAGTGAC GCTGACGGTA 1620 
CAGGCCAGAC TATTATTGTC TGGTATAGTG CAACAGCAGA ACAATTTGCT GAGGGCCATT 1680 
GAGGCGCAAC AGCATATGTT GCAACTCACA GTCTGGGGCA TCAAGCAGCT CCAGGCAAGA 1740 
GTCCTGGCTG TGGAAAGATA CCTAAAGGAT CAACAGCTCC TGGGGTTTTG GGGTTGCTCT 1800 
GGAAAACTCA TTTGCACCAC TACTGTGCCT TGGAATGCTA GTTGGAGTAA TAAATCTCTG 1860 
GATGATATTT GGAATAACAT GACCTGGATG CAGTGGGAAA GAGAAATTGA CAATTACACA 1920 
AGCTTAATAT ACTCATTACT AGAAAAATCG CAAACCCAAC AAGAAAAGAA TGAACAAGAA 1980 
TTATTGGAAT TGGATAAATG GGCAAGTTTG TGGAATTGGT TTGACATAAC AAATTGGCTG 2040 
TGGTATATAA AAATATTCAT AATGATAGTA GGAGGCTTGG TAGGTTTAAG AATAGTTTTT 2100 
GCTGTACTTT CTATAGTGAA TAGAGTTAGG CAGGGATACT CACCATTGTC GTTGCAGACC 2160 
CGCCCCCCAG TTCCGAGGGG ACCCGACAGG CCCGAAGGAA TCGAAGAAGA AGGTGGAGAG 2220 
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AGAGACAGAG ACACATCCGG TCGATTAGTG CATGGATTCT TAGCAATTAT CTGGGTCGAC 22 8 0 

CTGCGGAGCC TGTTCCTCTT CAGCTACCAC CACAGAGACT TACTCTTGAT TGCAGCGAGG 234 0 

ATTGTGGAAC TTCTGGGACG CAGGGGGTGG GAAGTCCTCA AATATTGGTG GAATCTCCTA 24 00 

CAGTATTGGA GTCAGGAACT AAAGAGTAGT GCTGTTAGCT TGCTTAATGC CACAGCTATA 246 0 

GCAGTAGCTG AGGGGACAGA TAGGGTTATA GAAGTACTGC AAAGAGCTGG TAGAGCTATT 2 52 0 

CTCCACATAC CTACAAGAAT AAGACAGGGC TTGGAAAGGG CTTTGCTATA A 2 571 



SEQ. LD. NO. 4 - SYNgp-160mn - codon optimised env sequence 

ATGAGGGTGA AGGGGATCCG CCGCAACTAC CAGCACTGGT GGGGCTGGGG CACGATGCTC 6 0 

CTGGGGCTGC TGATGATCTG CAGCGCCACC GAGAAGCTGT GGGTGACCGT GTACTACGGC 12 0 

GTGCCCGTGT GGAAGGAGGC CACCACCACC CTGTTCTGCG CCAGCGACGC CAAGGCGTAC 180 

GACACCGAGG TGCACAACGT GTGGGCCACC CAGGCGTGCG TGCCCACCGA CCCCAACCCC 24 0 

CAGGAGGTGG AGCTCGTGAA CGTGACCGAG AACTTCAACA TGTGGAAGAA CAACATGGTG 3 00 

GAGCAGATGC ATGAGGACAT CATCAGCCTG TGGGACCAGA GCCTGAAGCC CTGCGTGAAG 360 

CTGACCCCCC TGTGCGTGAC CCTGAACTGC ACCGACCTGA GGAACACCAC CAACACCAAC 42 0 

AACAGCACCG CCAACAACAA CAGCAACAGC GAGGGCACCA TCAAGGGCGG CGAGATGAAG 48 0 

AACTGCAGCT TCAACATCAC CACCAGCATC CGCGACAAGA TGCAGAAGGA GTACGCCCTG 540 

CTGTACAAGC TGGATATCGT GAGCATCGAC AACGACAGCA CCAGCTACCG CCTGATCTCC 6 00 

TGCAACACCA GCGTGATCAC CCAGGCCTGC CCCAAGATCA GCTTCGAGCC CATCCCCATC 660 

CACTACTGCG CCCCCGCCGG CTTCGCCATC CTGAAGTGCA ACGACAAGAA GTTCAGCGGC 72 0 

AAGGGCAGCT GCAAGAACGT GAGCACCGTG CAGTGCACCC ACGGCATCCG GCCGGTGGTG 780 

AGCACCCAGC TCCTGCTGAA CGGCAGCCTG GCCGAGGAGG AGGTGGTGAT CCGCAGCGAG 84 0 

AACTTCACCG ACAACGCCAA GACCATCATC GTGCACCTGA ATGAGAGCGT GCAGATCAAC 900 

TGCACGCGTC CCAACTACAA CAAGCGCAAG CGCATCCACA TCGGCCCCGG GCGCGCCTTC 960 

TACACCACCA AGAACATCAT CGGCACCATC CGCCAGGCCC ACTGCAACAT CTCTAGAGCC 102 0 

AAGTGGAACG ACACCCTGCG CCAGATCGTG AGCAAGCTGA AGGAGCAGTT CAAGAACAAG 108 0 

ACCATCGTGT TCAACCAGAG CAGCGGCGGC GACCCCGAGA TCGTGATGCA CAGCTTCAAC 114 0 

TGCGGCGGCG AATTCTTCTA CTGCAACACC AGCCCCCTGT TCAACAGCAC CTGQAACGGC 1200 

AACAACACCT GGAACAACAC CACCGGCAGC AACAACAATA TTACCCTCCA GTGCAAGATC 1260 

AAGCAGATCA TCAACATGTG GCAGGAGGTG GGCAAGGCCA TGTACGCCCC CCCCATCGAG 132 0 

GGCCAGATCC GGTGCAGCAG CAACATCACC GGTCTGCTGC TGACCCGCGA CGGCGGCAAG 138 0 

GACACCGACA CCAACGACAC CGAAATCTTC CGCCCCGGCG GCGGCGACAT GCGCGACAAC 144 0 

TGGAGATCTG AGCTGTACAA GTACAAGGTG GTGACGATCG AGCCCCTGGG CGTGGCCCCC 1500 

ACCAAGGCCA AGCGCCGCGT GGTGCAGCGC GAGAAGCGGG CCGCCATCGG CGCCCTGTTC 1560 

CTGGGCTTCC TGGGGGCGGC GGGCAGCACC ATGGGGGCCG CCAGCGTGAC CCTGACCGTG 162 0 

CAGGCCCGCC TGCTCCTGAG CGGCATCGTG CAGCAGCAGA ACAACCTCCT CCGCGCCATC 168 0 

GAGGCCCAGC AGCATATGCT CCAGCTCACC GTGTGGGGCA TCAAGCAGCT CCAGGCCCGC 1740 

GTGCTGGCCG TGGAGCGCTA CCTGAAGGAC CAGCAGCTCC TGGGCTTCTG GGGCTGCTCC 180 0 

GGCAAGCTGA TCTGCACCAC CACGGTACCC TGGAACGCCT CCTGGAGCAA CAAQAGCCTG 186 0 

GACGACATCT GGAACAACAT GACCTGGATG CAGTGGGAGC GCGAGATCGA TAACTACACC 192 0 

AGCCTGATCT ACAGCCTGCT GGAGAAGAGC CAGACCCAGC AGGAGAAGAA CGAGCAGGAG 198 0 

CTGCTGGAGC TGGACAAGTG GGCGAGCCTG TGGAACTGGT TCGACATCAC CAACTGGCTG 2040 

TGGTACATCA AAATCTTCAT CATGATTGTG GGCGGCCTGG TGGGCCTCCG CATCGTGTTC 2100 

GCCGTGCTGA GCATCGTGAA CCGCGTGCGC CAGGGCTACA GCCCCCTGAG CCTCCAGACC 216 0 

CGGCCCCCCG TGCCGCGCGG GCCCGACCGC CCCGAGGGCA TCGAGGAGGA GGGCGGCGAG 222 0 

CGCGACCGCG ACACCAGCGG CAGGCTCGTG CACGGCTTCC TGGCGATCAT CTGGGTCGAC 22 80 

CTCCGCAGCC TGTTCCTGTT CAGCTACCAC CACCGCGACC TGCTGCTGAT CGCCGCCCGC 234 0 

ATCGTGGAAC TCCTAGGCCG CCGCGGCTGG GAGGTGCTGA AGTACTGGTG GAACCTCCTC 24 00 

CAGTATTGGA GCCAGGAGCT GAAGTCCAGC GCCGTGAGCC TGCTGAACGC CACCGCCATC 246 0 

GCCGTGGCCG AGGGCACCGA CCGCGTGATC GAGGTGCTCC AGAGGGCCGG GAGGGCGATC 252 0 

CTGCACATCC CCACCCGCAT CCGCCAGGGG CTCGAGAGGG CGCTGCTGTA A 2571 



SEQ. LD. NO. 1 1 - Complete Sequence of pH4DOZENEGS 

CTGACGCGCC CTGTAGCGGC GCATTAAGCG CGGCGGGTGT GGTGGTTACG CGCAGCGTGA 60 

CCGCTACACT TGCCAGCGCC CTAGCGCCCG CTCCTTTCGC TTTCTTCCCT TCCTTTCTCG 120 

CCACGTTCGC CGGCTTTCCC CGTCAAGCTC TAAATCGGGG GCTCCCTTTA GGGTTCCGAT 18 0 

TTAGTGCTTT ACGGCACCTC GACCCCAAAA AACTTGATTA GGGTGATGGT TCACGTAGTG 24 0 

GGCCATCGCC CTGATAGACG GTTTTTCGCC CTTTGACGTT GGAGTCCACG TTCTTTAATA 3 00 

GTGGACTCTT GTTCCAAACT GGAACAACAC TCAACCCTAT CTCGGTCTAT TCTTTTGATT 360 

TATAAGGGAT TTTGCCGATT TCGGCCTATT GGTTAAAAAA TGAGCTGATT TAACAAAAAT 42 0 

TTAACGCGAA TTTTAACAAA ATATTAACGC TTACAATTTC CATTCGCCAT TCAGGCTGCG 480 
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GAACTGTTGG GAAGGGCGAT CGGTGCGGGC CTCTTCGCTA TTACGCCAGC TGGCGAAAGG 540 

GGGATGTGCT GCAAGGCGAT TAAGTTGGGT AACGCCAGGG TTTTCCCAGT CACGACGTTG 600 
TAAAACGACG GCCAGTGAGC GCGCGTAATA CGACTCACTA TAGGGCGAAT TGGAGCTCCA 660 

CCGCGGTGGC GGCCGCTCTA GAGTCCGTTA CATAACTTAC GGTAAATGGC CCGCCTGGCT 72 0 

GACCGCCCAA CGACCCCCGC CCATTGACGT CAATAATGAC GTATGTTCCC ATAGTAACGC 78 0 

CAATAGGGAC TTTCCATTGA CGTCAATGGG TGGAGTATTT ACGGTAAACT GCCCACTTGG 84 0 

CAGTACATCA AGTGTATCAT ATGCCAAGTA CGCCCCCTAT TGACGTCAAT GACGGTAAAT 900 

GGCCCGCCTG GCATTATGCC CAGTACATGA CCTTATGGGA CTTTCCTACT TGGCAGTACA 960 

TCTACGTATT AGTCATCGCT ATTACCATGG TGATGCGGTT TTGGCAGTAC ATCAATGGGC 102 0 

GTGGATAGCG GTTTGACTCA CGGGGATTTC CAAGTCTCCA CCCCATTGAC GTCAATGGGA 108 0 

GTTTGTTTTG GCACCAAAAT CAACGGGACT TTCCAAAATG TCGTAACAAC TCCGCCCCAT 114 0 

TGACGCAAAT GGGCGGTAGG CGTGTACGGT GGGAGGTCTA TATAAGCAGA GCTCGTTTAG 1200 

TGAACCGGTC TCTCTGGTTA GACCAGATCT GAGCCTGGGA GCTCTCTGGC TAACTAGGGA 1260 

ACCCACTGCT TAAGCCTCAA TAAAGCTTGC CTTGAGTGCT TCAAGTAGTG TGTGCCCGTC 1320 

TGTTGTGTGA CTCTGGTAAC TAGAGATCCC TCAGACCCTT TTAGTCAGTG TGGAAAATCT 13 80 

CTAGCAGTGG CGCCCGAACA GGGACTTGAA AGCGAAAGGG AAACCAGAGG AGCTCTCTCG 1440 

ACGCAGGACT CGGCTTGCTG AAGCGCGCAC GGCAAGAGGC GAGGGGCGGC GACTGGTGAG 1500 

TACGCCAAAA ATTTTGACTA GCGGAGGCTA GAAGGAGAGA GATGGGTGCG AGAGCGTCAG 156 0 

TATTAAGGGG GGGAGAATTA GATCGCGATG GGAAAAAATT CGGTTAAGGC CAGGGGGAAA 162 0 

GAAAAAATAT AAATTAAAAC ATATAGTATG GGCAAGCAGG GAGCTAGAAC GATTCGCAGT 168 0 

TAATCCTGGC CTGTTAGAAA CATCAGAAGG CTGTAGACAA ATACTGGGAC AGCTACAACC 1740 

ATCCCTTCAG ACAGGATCAG AAGAACTTAG ATCATTATAT AATACAGTAG CAACCCTCTA 18 00 

TTGTGTGCAT CAAAGGTTGA GATAAAAGAC ACCAAGGAAG CTTTAGACAA GATAGAGGGA 1860 

GAGCAAAACA AAAGTAAGAA AAAAGCACAG CAAGCAGCAG CTGACACAGG ACACAGCAAT 192 0 

CAGGTCAGCC AAAATTACCC TATAGTGCAG AACATCCAGG GGCAAATGGT ACATC7VGGCC 1980 

ATATCACCTA GAACTTTAAA TGCATGGGTA AAAGTAGTAG AAGAGAAGGC TTTCAGCCCA 204 0 

GAAGTGATAC CCATGTTTTC AGCATTATCA GAAGGAGCCA CCCCACAAGA TTTAAACACC 2100 

ATGCTAAACA CAGTGGGGGG ACATCAAGCA GCCATGCAAA TGTTAAAAGA GACCATCAAT 216 0 

GAGGAAGCTG CAGGAATTCG CCTAAAACTG CTTGTACCAA TTGCTATTGT AAAAAGTGTT 222 0 

GCTTTCATTG CCAAGTTTGT TTCATAACAA AAGCCTTAGG CATCTCCTAT GGCAGGAAGA 228 0 

AGCGGAGACA GCGACGAAGA GCTCATCAGA ACAGTCAGAC TCATCAAGCT TCTCTATCAA 2340 

AGCAGTAAGT AGTACATGTA ACGCAACCTA TACCAATAGT AGCAATAGTA GCATTAGTAG 240 0 

TAGCAATAAT AATAGCAATA GTTGTGTGGT CCATAGTAAT CATAGAATAT AGGAAAATAT 246 0 

TAAGACAAAG AAAAATAGAC AGGTTAATTG ATAGACTAAT AGAAAGAGCA GAAGACAGTG 2520 

GCAATGAGAG TGAAGGAGAA ATATCAGCAC TTGTGGAGAT GGGGGTGGAG ATGGGGCACC 2580 

ATGCTCCTTG GGATGTTGAT GATCTGTAGT GCTACAGAAA AATTGTGGGT CACAGTCTAT 2640 

TATGGGGTAC CTGTGTGGAA GGAAGCAACC ACCACTCTAT TTTGTGCATC AGATGCTAAA 2 700 

GCATAGATCT TCAGACTTGG AGGAGGAGAT ATGAGGGACA ATTGGAGAAG TGAATTATAT 2 760 

AAATATAAAG TAGTAAAAAT TGAACCATTA GGAGTAGCAC CCACCAAGGC AAAGAGAAGA 2820 

GTGGTGCAGA GAGAAAAAAG AGCAGTGGGA ATAGGAGCTT TGTTCCTTGG GTTCTTGGGA 28 8 0 

GCAGCAGGAA GCACTATGGG CGCAGCGTCA ATGACGCTGA CGGTACAGGC CAGACAATTA 294 0 

TTGTCTGGTA TAGTGCAGCA GCAGAACAAT TTGCTGAGGG CTATTGAGGC GCAACAGCAT 300 0 

CTGTTGCAAC TCACAGTCTG GGGCATCAAG CAGCTCCAGG CAAGAATCCT GGCTGTGGAA 3060 

AGATACCTAA AGGATCAACA GCTCCTGGGG ATTTGGGGTT GCTCTGGAAA ACTCATTTGC 3120 

ACCACTGCTG TGCCTTGGAA TGCTAGTTGG AGTAATAAAT CTCTGGAACA GATCTGGAAT 3180 

CACACGACCT GGATGGAGTG GGACAGAGAA ATTAACAATT ACACAAGCTT AATACACTCC 3240 

TTAATTGAAG AATCGCAAAA CCAGCAAGAA AAGAATGAAC AAGAATTATT GGAATTAGAT 33 00 

AAATGGGCAA GTTTGTGGAA TTGGTTTAAC ATAACAAATT GGCTGTGGTA TATAAAATTA 3360 

TTCATAATGA TAGTAGGAGG CTTGGTAGGT TTAAGAATAG TTTTTGCTGT ACTTTCTATA 342 0 

GTGAATAGAG TTAGGCAGGG ATATTCACCA TTATCGTTTC AGACCCACCT CCCAACCCCG 3480 

AGGGGACCCG ACT^GGCCCGA AGGAATAGAA QAAGAAGGTG GAGAGAGAGA CAGAGACAGA 3 540 

TCCATTCGAT TAGTGAACGG ATCCTTGGCA CTTATCTGGG ACGATCTGCG GAGCCTGTGC 3600 

CTCTTCAGCT ACCACCGCTT GAGAGACTTA CTCTTGATTG TAACGAGGAT TGTGGAACTT 3 66 0 

CTGGGACGCA GGGGGTGGGA AGCCCTCAAA TATTGGTGGA ATCTCCTACA GTATTGGAGT 3 72 0 

CAGGAACTAA AGAATAGTGC TGTTAGCTTG CTCAATGCCA CAGCCATAGC AGTAGCTGAG 3 78 0 

GGGACAGATA GGGTTATAGA AGTAGTACAA GGAGCTTGTA GAGCTATTCG CCACATACCT 3 840 

AGAAGAATAA GACAGGGCTT GGAAAGGATT TTGCTATAAG ATGGGTGGCA AGTGGTCAAA 3 90 0 

AAGTAGTGTG ATTGGATGGC CTACTGTAAG GGAAAGAATG AGACGAGCTG AGCCAGCAGC 3 960 

AGATAGGGTG GGAGCAGCAT CTCGACGCTG CAGGAGTGGG GAGGCACGAT GGCCGCTTTG 4 02 0 

GTCGAGGCGG ATCCGGCCAT TAGCCATATT ATTCATTGGT TATATAGCAT AAATCAATAT 4 080 

TGGCTATTGG CCATTGCATA CGTTGTATCC ATATCATAAT ATGTACATTT ATATTGGCTC 414 0 

ATGTCCAACA TTACCGCCAT GTTGACATTG ATTATTGACT AGTTATTAAT AGTAATCAAT 42 0 0 
TACGGGGTCA TTAGTTCATA GCCCATATAT GGAGTTCCGC GTTACATAAC TTACGGTAAA 4260 

TGGCCCGCCT GGCTGACCGC CCAACGACCC CCGCCCATTG ACGTCAATAA TGACGTATGT 432 0 
TCCCATAGTA ACGCCAATAG GGACTTTCCA TTGACGTCAA TGGGTGGAGT ATTTACGGTA 4380 
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AACTGCCCAC TTGGCAGTAC ATCAAGTGTA TCATATGCCA AGTACGCCCC CTATTGACGT 444 0 
CAATGACGGT AAATGGCCCG CCTGGCATTA TGCCCAGTAC ATGACCTTAT GGGACTTTCC 450 0 
TACTTGGCAG TACATCTACG TATTAGTCAT CGCTATTACC ATGGTGATGC GGTTTTGGCA 4560 
GTACATCAAT GGGCGTGGAT AGCGGTTTGA CTCACGGGGA TTTCCAAGTC TCCACCCCAT 462 0 
TGACGTCAAT GGGAGTTTGT TTTGGCACCA AAATCAACGG GACTTTCCAA AATGTCGTAA 468 0 
CAACTCCGCC CCATTGACGC AAATGGGCGG TAGGCATGTA CGGTGGGAGG TCTATATAAG 4 74 0 
CAGAGCTCGT TTAGTGAACC GTCAGATCGC CTGGAGACGC CATCCACGCT GTTTTGACCT 4 8 00 
CCATAGAAGA CACCGGGACC GATCCAGCCT CCGCGGCCCC AAGCTTCAGC TGCTCGAGCC 486 0 
CGGGGATGAC GTCATCGACT TCGAAGGTTC GAATCCTTCT ACTGCCACCA TTTTTTCTCT 4 920 
ACGTCATCGA CTTCGAAGGT TCGAATCCTT CCCTGTCCAC CAGTCGAGTA TTACGTCATC 4 98 0 
GACTTCGAAG GTTCGAATCC TTCTAGATTC ACCATTTTTT AGGAACGTCA TCGACTTCGA 504 0 
AGGTTCGAAT CCTTCCAGTT CCACCAGTCG AGGCCAACGT CATCGACTTC GAAGGTTCGA 5100 
ATCCTTCTCT TCCCACCATT TTTTTTCCAC GTCATCGACT TCGAAGGTTC GAATCCTTCG 516 0 
GGGCCCACCA GTCGAGGGCT ACGTCATCGA CTTCGAAGGT TCGAATCCTT CTTGCTTCAC 52 2 0 
CATTTTTTCT GAACGTCATC GACTTCGAAG GTTCGAATCC TTCTGCTGTC ACCAGTCGAG 52 80 
TATAACGTCA TCGACTTCGA AGGTTCGAAT CCTTCACCGG TCACCATTTT TTTATAACGT 5340 
CATCGACTTC GAAGGTTCGA ATCCTTCTTC TTACACCAGT CGAGGTACAC GTCATCGACT 5400 
TCGAAGGTTC GAATCCTTCG TAGTTCACCA TTTTTTGTGC ACGTCATCGA CTTCGAAGGT 5460 
TCGAATCCTT CTAGGCCCAC CAGTCGACGC ATGCCTGCAG GTCGAGGTCG ATACCGTCGA 552 0 
GACCTAGAAA AACATGGAGC AATCACAAGT AGCAATACAG CAGCTACCAA TGCTGATTGT 558 0 
GCCTGGCTAG AAGCACAAGA GGAGGAGGAG GTGGGTTTTC CAGTCACACC TCAGGTACCT 564 0 
TTAAGACCAA TGACTTACAA GGCAGCTGTA GATCTTAGCC ACTTTTTAAA AGAAAAGGGG 570 0 
GGACTGGAAG GGCTAATTCA CTCCCAACGA AGACAAGATA TCCTTGATCT GTGGATCTAC 576 0 
CACACACAAG GCTACTTCCC TGATTGGCAG AACTACACAC CAGGGCCAGG GATCAGATAT 582 0 
CCACTGACCT TTGGATGGTG CTACAAGCTA GTACCAGTTG AGCAAGAGAA GGTAGAAGAA 58 8 0 
GCCAATGAAG GAGAGAACAC CCGCTTGTTA CACCCTGTGA GCCTGCATGG GATGGATGAC 594 0 
CCGGAGAGAG AAGTATTAGA GTGGAGGTTT GACAGCCGCC TAGCATTTCA TCACATGGCC 600 0 
CGAGAGCTGC ATCCGGAGTA CTTCAAGAAC TGCTGACATC GAGCTTGCTA CAAGGGACTT 6060 
TCCGCTGGGG ACTTTCCAGG GAGGCGTGGC CTGGGCGGGA CTGGGGAGTG GCGAGCCCTC 6120 
AGATGCTGCA TATAAGCAGC TGCTTTTTGC CTGTACTGGG TCTCTCTGGT TAGACCAGAT 618 0 
CTGAGCCTGG GAGCTCTCTG GCTAACTAGG GAACCCACTG CTTAAGCCTC AATAAAGCTT 624 0 
GCCTTGAGTG CTTCAAGTAG TGTGTGCCCG TCTGTTGTGT GACTCTGGTA ACTAGAGATC 63 00 
CCTCAGACCC TTTTAGTCAG TGTGGAAAAT CTCTAGCAGT CGAGGGGGGG CCCGGTACCC 63 60 
AGCTTTTGTT CCCTTTAGTG AGGGTTAATT GCGCGCTTGG CGTAATCATG GTCATAGCTG 642 0 
TTTCCTGTGT GAAATTGTTA TCCGCTCACA ATTCCACACA ACATACGAGC CGGAAGCATA 64 8 0 
AAGTGTAAAG CCTGGGGTGC CTAATGAGTG AGCTAACTCA CATTAATTGC GTTGCGCTCA 6 540 
CTGCCCGCTT TCCAGTCGGG AAACCTGTCG TGCCAGCTGC ATTAATGAAT CGGCCAACGC 66 00 
GCGGGGAGAG GCGGTTTGCG TATTGGGCGC TCTTCCGCTT CCTCGCTCAC TGACTCGCTG 6660 
CGCTCGGTCG TTCGGCTGCG GCGAGCGGTA TCAGCTCACT CAAAGGCGGT AATACGGTTA 6720 
TCCACAGAAT CAGGGGATAA CGCAGGAAAG AACATGTGAG CAAAAGGCCA GCAAAAGGCC 6 780 
AGGAACCGTA AAAAGGCCGC GTTGCTGGCG TTTTTCCATA GGCTCCGCCC CCCTGACGAG 684 0 
CATCACAAAA ATCGACGCTC AAGTCAGAGG TGGCGAAACC CGACAGGACT ATAAAGATAC 6900 
CAGGCGTTTC CCCCTGGAAG CTCCCTCGTG CGCTCTCCTG TTCCGACCCT GCCGCTTACC 696 0 
GGATACCTGT CCGCCTTTCT CCCTTCGGGA AGCGTGGCGC TTTCTCATAG CTCACGCTGT 702 0 
AGGTATCTCA GTTCGGTGTA GGTCGTTCGC TCCAAGCTGG GCTGTGTGCA CGAACCCCCC 7080 
GTTCAGCCCG ACCGCTGCGC CTTATCCGGT AACTATCGTC TTGAGTCCAA CCCGGTAAGA 714 0 
CACGACTTAT CGCCACTGGC AGCAGCCACT GGTAACAGGA TTAGCAGAGC GAGGTATGTA 72 00 
GGCGGTGCTA CAGAGTTCTT GAAGTGGTGG CCTAACTACG GCTACACTAG AAGGACAGTA 726 0 
TTTGGTATCT GCGCTCTGCT GAAGCCAGTT ACCTTCGGAA AAAGAGTTGG TAGCTCTTGA 732 0 
TCCGGCAAAC AAACCACCGC TGGTAGCGGT GGTTTTTTTG TTTGCAAGCA GCAGATTACG 73 8 0 
CGCAGAAAAA AAGGATCTCA AGAAGATCCT TTGATCTTTT CTACGGGGTC TGACGCTCAG 744 0 
TGGAACGAAA ACTCACGTTA AGGGATTTTG GTCATGAGAT TATCAAAAAG GATCTTCACC 75 0 0 
TAGATCCTTT TAAATTAAAA ATGAAGTTTT AAATCAATCT AAAGTATATA TGAGTAAACT 756 0 
TGGTCTGACA GTTACCAATG CTTAATCAGT GAGGCACCTA TCTCAGCGAT CTGTCTATTT 762 0 
CGTTCATCCA TAGTTGCCTG ACTCCCCGTC GTGTAGATAA CTACGATACG GGAGGGCTTA 768 0 
CCATCTGGCC CCAGTGCTGC AATGATACCG CGAGACCCAC GCTCACCGGC TCCAGATTTA 774 0 
TCAGCAATAA ACCAGCCAGC CGGAAGGGCC GAGCGCAGAA GTGGTCCTGC AACTTTATCC 78 0 0 
GCCTCCATCC AGTCTATTAA TTGTTGCCGG GAAGCTAGAG TAAGTAGTTC GCCAGTTAAT 786 0 
AGTTTGCGCA ACGTTGTTGC CATTGCTACA GGCATCGTGG TGTCACGCTC GTCGTTTGGT 792 0 
ATGGCTTCAT TCAGCTCCGG TTCCCAACGA TCAAGGCGAG TTACATGATC CCCCATGTTG 7980 
TGCAAAAAAG CGGTTAGCTC CTTCGGTCCT CCGATCGTTG TCAGAAGTAA GTTGGCCGCA 804 0 
GTGTTATCAC TCATGGTTAT GGCAGCACTG CATAATTCTC TTACTGTCAT GCCATCCGTA 8100 
AGATGCTTTT CTGTGACTGG TGAGTACTCA ACCAAGTCAT TCTGAGAATA GTGTATGCGG 8160 
CGACCGAGTT GCTCTTGCCC GGCGTCAATA CGGGATAATA CCGCGCCACA TAGCAGAACT 822 0 
TTAAAAGTGC TCATCATTGG AAAACGTTCT TCGGGGCGAA AACTCTCAAG GATCTTACCG 8280 
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CTGTTGAGAT CCAGTTCGAT GTAACCCACT CGTGCACCCA ACTGATCTTC AGCATCTTTT 8 340 

ACTTTCACCA GCGTTTCTGG GTGAGCAAAA ACAGGAAGGC AAAATGCCGC AAAAAAGGGA 84 00 

ATAAGGGCGA CACGGAAATG TTGAATACTC ATACTCTTCC TTTTTCAATA TTATTGAAGC 846 0 

ATTTATCAGG GTTATTGTCT CATGAGCGGA TACATATTTG AATGTATTTA GAAAAATAAA 8 52 0 

CAAATAGGGG TTCCGCGCAC ATTTCCCCGA AAAGTGCCAC 8 560 

SEQ. LD. NO. 12 - pSYNGP2 - codon optimised HIV-l gagpol with leader sequence 

1 GGGTCTCTCT GGTTAGACCA GATCTGAGCC TGGGAGCTCT CTGGCTAACT AGGGAACCCA 
61 CTGCTTAAGC CTCAATAAAG CTTGCCTTGA GTGCTTCAAG TAGTGTGTGC CCGTCTGTTG 
121 TGTGACTCTG GTAACTAGAG ATCCCTCAGA CCCTTTTAGT CAGTGTGGAA AATCTCTAGC 
181 AGTGGCGCCC GAACAGGGAC CTGAAAGCGA AAGGGAAACC AGAGCTCTCT CGACGCAGGA 
241 CTCGGCTTGC TGAAGCGCCC GCACGGCAAG AGGCGAGGGG CGGCGACTGG TGAGTACGCC 
3 01 AAAAATTTTG ACTAGCGGAG GCTAGAAGGA GAGAGATGGG CGCCCGCGCC AGCGTGCTGT 
361 CGGGCGGCGA GCTGGACCGC TGGGAGAAGA TCCGCCTGCG CCCCGGCGGC AAAAAGAAGT 
421 ACAAGCTGAA GCACATCGTG TGGGCCAGCC GCGAACTGGA GCGCTTCGCC GTGAACCCCG 
481 GGCTCCTGGA GACCAGCGAG GGGTGCCGCC AGATCCTCGG CCAACTGCAG CCCAGCCTGC 
541 AAACCGGCAG CGAGGAGCTG CGCAGCCTGT ACAACACCGT GGCCACGCTG TACTGCGTCC 
601 ACCAGCGCAT CGAAATCAAG GATACGAAAG AGGCCCTGGA TAAAATCGAA GAGGAACAGA 
661 ATAAGAGCAA AAAGAAGGCC CAACAGGCCG CCGCGGACAC CGGACACAGC AACCAGGTCA 
721 GCCAGAACTA CCCCATCGTG CAGAACATCC AGGGGCAGAT GGTGCACCAG GCCATCTCCC 
781 CCCGCACGCT GAACGCCTGG GTGAAGGTGG TGGAAGAGAA GGCTTTTAGC CCGGAGGTGA 
841 TACCCATGTT CTCAGCCCTG TCAGAGGGAG CCACCCCCCA AGATCTGAAC ACCATGCTCA 
901 ACACAGTGGG GGGACACCAG GCCGCCATGC AGATGCTGAA GGAGACCATC AATGAGGAGG 
961 CTGCCGAATG GGATCGTGTG CATCCGGTGC ACGCAGGGCC CATCGCACCG GGCCAGATGC 
1021 GTGAGCCACG GGGCTCAGAC ATCGCCGGAA CGACTAGTAC CCTTCAGGAA CAGATCGGCT 
1081 GGATGACCAA CAACCCACCC ATCCCGGTGG GAGAAATCTA CAAACGCTGG ATCATCCTGG 
1141 GCCTGAACAA GATCGTGCGC ATGTATAGCC CTACCAGCAT CCTGGACATC CGCCAAGGCC 
1201 CGAAGGAACC CTTTCGCGAC TACGTGGACC GGTTCTACAA AACGCTCCGC GCCGAGCAGG 
1261 CTAGCCAGGA GGTGAAGAAC TGGATGACCG AAACCCTGCT GGTCCAGAAC GCGAACCCGG 
1321 ACTGCAAGAC GATCCTGAAG GCCCTGGGCC CAGCGGCTAC CCTAGAGGAA ATGATGACCG 
13 81 CCTGTCAGGG AGTGGGCGGA CCCGGCCACA AGGCACGCGT CCTGGCTGAG GCCATGAGCC 
1441 AGGTGACCAA CTCCGCTACC ATCATGATGC AGCGCGGCAA CTTTCGGAAC CAACGCAAGA 
1501 TCGTCAAGTG CTTCAACTGT GGCAAAGAAG GGCACACAGC CCGCAACTGC AGGGCCCCTA 
1561 GGAAAAAGGG CTGTTGGAAA TGTGGAAAGG AAGGACACCA AATGAAAGAT TGTACTGAGA 
1621 GACAGGCTAA TTTTTTAGGG AAGATCTGGC CTTCCCACAA GGGAAGGCCA GGGAATTTTC 
1681 TTCAGAGCAG ACCAGAGCCA ACAGCCCCAC CAGAAGAGAG CTTCAGGTTT GGGGAAGAGA 
1741 CAACAACTCC CTCTCAGAAG CAGGAGCCGA TAGACAAGGA ACTGTATCCT TTAGCTTCCC 
1801 TCAGATCACT CTTTGGCAGC GACCCCTCGT CACAATAAAG ATAGGGGGGC AGCTCAAGGA 
1861 GGCTCTCCTG GACACCGGAG CAGACGACAC CGTGCTGGAG GAGATGTCGT TGCCAGGCCG 
1921 CTGGAAGCCG AAGATGATCG GGGGAATCGG CGGTTTCATC AAGGTGCGCC AGTATGACCA 
1981 GATCCTCATC GAAATCTGCG GCCACAAGGC TATCGGTACC GTGCTGGTGG GCCCCACACC 
2041 CGTCAACATC ATCGGACGCA ACCTGTTGAC GCAGATCGGT TGCACGCTGA ACTTCCCCAT 
2101 TAGCCCTATC GAGACGGTAC CGGTGAAGCT GAAGCCCGGG ATGGACGGCC CGAAGGTCAA 
2161 GCAATGGCCA TTGACAGAGG AGAAGATCAA GGCACTGGTG GAGATTTGCA CAGAGATGGA 
2221 AAAGGAAGGG AAAATCTCCA AGATTGGGCC TGAGAACCCG TACAACACGC CGGTGTTCGC 
2281 AATCAAGAAG AAGGACTCGA CGAAATGGCG CAAGCTGGTG GACTTCCGCG AGCTGAACAA 
2341 GCGCACGCAA GACTTCTGGG AGGTTCAGCT GGGCATCCCG CACCCCGCAG GGCTGAAGAA 
2401 GAAGAAATCC GTGACCGTAC TGGATGTGGG TGATGCCTAC TTCTCCGTTC CCCTGGACGA 
2461 AGACTTCAGG AAGTACACTG CCTTCACAAT CCCTTCGATC AACAACGAGA CACCGGGGAT 
2521 TCGATATCAG TACAACGTGC TGCCCCAGGG CTGGAAAGGC TCTCCCGCAA TCTTCCAGAG 
2581 TAGCATGACC AAAATCCTGG AGCCTTTCCG CAAACAGAAC CCCGACATCG TCATCTATCA 
2641 GTACATGGAT GACTTGTACG TGGGCTCTGA TCTAGAGATA GGGCAGCACC GCACCAAGAT 
2701 CGAGGAGCTG CGCCAGCACC TGTTGAGGTG GGGACTGACC ACACCCGACA AGAAGCACCA 

2 761 GAAGGAGCCT CCCTTCCTCT GGATGGGTTA CGAGCTGCAC CCTGACAAAT GGACCGTGCA 
2821 GCCTATCGTG CTGCCAGAGA AAGACAGCTG GACTGTCAAC GACATACAGA AGCTGGTGGG 
2881 GAAGTTGAAC TGGGCCAGTC AGATTTACCC AGGGATTAAG GTGAGGCAGC TGTGCAAACT 
2941 CCTCCGCGGA ACCAAGGCAC TCACAGAGGT GATCCCCCTA ACCGAGGAGG CCGAGCTCGA 

3 001 ACTGGCAGAA AACCGAGAGA TCCTAAAGGA GCCCGTGCAC GGCGTGTACT ATGACCCCTC 
3061 CAAGGACCTG ATCGCCGAGA TCCAGAAGCA GGGGCAAGGC CAGTGGACCT ATCAGATTTA 
3121 CCAGGAGCCC TTCAAGAACC TGAAGACCGG CAAGTACGCC CGGATGAGGG GTGCCCACAC 
3181 TAACGACGTC AAGCAGCTGA CCGAGGCCGT GCAGAAGATC ACCACCGAAA GCATCGTGAT 
3241 CTGGGGAAAG ACTCCTAAGT TCAAGCTGCC CATCCAGAAG GAAACCTGGG AAACCTGGTG 
3301 GACAGAGTAT TGGCAGGCCA CCTGGATTCC TGAGTGGGAG TTCGTCAACA CCCCTCCCCT 
3361 GGTGAAGCTG TGGTACCAGC TGGAGAAGGA GCCCATAGTG GGCGCCGAAA CCTTCTACGT 
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3421 GGATGGGGCC GCTAACAGGG AGACTAAGCT 

34 81 CAGACAGAAG GTTGTCACCC TCACTGACAC 

3541 TTACCTCGCT TTGCAGGACT CGGGCCTGGA 

3601 CCTGGGCATC ATTCAAGCCC AGCCAGACCA 

3661 CGAGCAGCTG ATCAAGAAGG AAAAGGTCTA 

3 721 TGGCGGCAAT GAGCAGGTCG ACAAGCTGGT 

3 781 GGATGGCATC GACAAGGCCC AGGACGAGCA 

3841 GGCTAGCGAC TTCAACCTGC CCCCTGTGGT 

3 901 GTGTCAGCTC AAGGGCGAAG CCATGCATGG 

3 961 ACTCGATTGC ACCCATCTGG AGGGCAAGGT 

4 021 CTACATCGAG GCCGAGGTCA TTCCCGCCGA 
4 081 GAAGCTGGCA GGCCGGTGGC CAGTGAAGAC 
4141. CAGTGCTACG GTTAAGGCCG CCTGCTGGTG 
4201 CTACAATCCC CAGAGTCAGG GCGTCGTCGA 
4261 CGGCCAGGTC AGAGATCAGG CTGAGCATCT 
4321 CCACAATTTC AAGCGGAAGG GGGGGATTGG 
4381 CATCATCGCG ACCGACATCC AGACTAAGGA 
4441 TTTCCGGGTC TACTACAGGG ACAGCAGAAA 
4501 CTGGAAGGGT GAGGGGGCAG TAGTGATCCA 
4561 AAGAAAGGCG AAGATCATTA GGGATTATGG 
4621 GAGCAGACAG GATGAGGATT AG 



GGGCAAAGCC GGATACGTCA CTAACCGGGG 
CACCAACCAG AAGACTGAGC TGCAGGCCAT 
GGTGAACATC GTGACAGACT CTCAGTATGC 
GAGTGAGTCC GAGCTGGTCA ATCAGATCAT 
TCTGGCCTGG GTACCCGCCC ACAAAGGCAT 
CTCGGCTGGC ATCAGGAAGG TGCTATTCCT 
CGAGAAATAC CACAGCAACT GGCGGGCCAT 
GGCCAAAGAG ATCGTGGCCA GCTGTGACAA 
CCAGGTGGAC TGTAGCCCCG GCATCTGGCA 
TATCCTGGTA GCCGTCCATG TGGCCAGTGG 
AACAGGGCAG GAGACAGCCT ACTTCCTCCT 
CATCCATACT GACAATGGCA GCAATTTC7VC 
GGCGGGAATC AAGCAGGAGT TCGGGATCCC 
GTCTATGAAT AAGGAGTTAA AGAAGATTAT 
CAAGACCGCG GTCCAAATGG CGGTATTCAT 
GGGGTACAGT GCGGGGGAGC GGATCGTGGA 
GCTGCAAAAG CAGATTACCA AGATTCAGAA 
TCCCCTCTGG AAAGGCCCAG CGAAGCTCCT 
GGATAATAGC GACATCAAGG TGGTGCCCAG 
CAAACAGATG GCGGGTGATG ATTGCGTGGC 



SEQ. LD. NO. 13 - pSYNGPS - codon optimised HTV-l gagpol with leader sequence from 
the major splice donor 

1 GTGAGTACGC CAAAAATTTT GACTAGCGGA GGCTAGAAGG AGAGAGATGG GCGCCCGCGC 

61 CAGCGTGCTG TCGGGCGGCG AGCTGGACCG CTGGGAGAAG ATCCGCCTGC GCCCCGGCGG 

121 CAAAAAGAAG TACAAGCTGA AGCACATCGT GTGGGCCAGC CGCGAACTGG AGCGCTTCGC 

181 CGTGAACCCC GGGCTCCTGG AGACCAGCGA GGGGTGCCGC CAGATCCTCG GCCAACTGCA 

241 GCCCAGCCTG CAAACCGGCA GCGAGGAGCT GCGCAGCCTG TACAACACCG TGGCCACGCT 

301 GTACTGCGTC CACCAGCGCA TCGAAATCAA GGATACGAAA GAGGCCCTGG ATAAAATCGA 

361 AGAGGAACAG AATAAGAGCA AAAAQAAGGC CCAACAGGCC GCCGCGGACA CCGGACACAG 

421 CAACCAGGTC AGCCAGAACT ACCCCATCGT GCAGAACATC CAGGGGCAGA TGGTGCACCA 

481 GGCCATCTCC CCCCGC7VCGC TGAACGCCTG GGTGAAGGTG GTGGAAGAGA AGGCTTTTAG 

541 CCCGGAGGTG ATACCCATGT TCTCAGCCCT GTCAGAGGGA GCCACCCCCC AAGATCTGAA 

601 CACCATGCTC AACACAGTGG GGGGACACCA GGCCGCCATG CAGATGCTGA AGGAGACCAT 

661 CAATGAGGAG GCTGCCGAAT GGGATCGTGT GCATCCGGTG CACGCAGGGC CCATCGCACC 

721 GGGCCAGATG CGTGAGCCAC GGGGCTCAGA CATCGCCGGA ACGACTAGTA CCCTTCAGGA 

781 ACAGATCGGC TGGATGACCA ACAACCCACC CATCCCGGTG GGAGAAATCT ACAAACGCTG 

841 GATCATCCTG GGCCTGAACA AGATCGTGCG CATGTATAGC CCTACCAGCA TCCTGGACAT 

901 CCGCCAAGGC CCGAAGGAAC CCTTTCGCGA CTACGTGGAC CGGTTCTACA AAACGCTCCG 

961 CGCCGAGCAG GCTAGCCAGG AGGTGAAGAA CTGGATGACC GAAACCCTGC TGGTCCAGAA 

1021 CGCGAACCCG GACTGCAAGA CGATCCTGAA GGCCCTGGGC CCAGCGGCTA CCCTAGAGGA 

1081 AATGATGACC GCCTGTCAGG GAGTGGGCGG ACCCGGCCAC AAGGCACGCG TCCTGGCTGA 

1141 GGCCATGAGC CAGGTGACCA ACTCCGCTAC CATCATGATG CAGCGCGGCA ACTTTCGGAA 

12 01 CCAACGCAAG ATCGTCAAGT GCTTCAACTG TGGCAAAGAA GGGCACACAG CCCGCAACTG 
1261 CAGGGCCCCT AGGT^T^AAAGG GCTGTTGGAA ATGTGGAAAG GAAGQACACC AAATGAAAGA 
1321 TTGTACTGAG AGACAGGCTA ATTTTTTAGG GAAGATCTGG CCTTCCCACA AGGGAAGGCC 

13 81 AGGGAATTTT CTTCAGAGCA GACCAGAGCC AACAGCCCCA CCAGAAGAGA GCTTCAGGTT 
1441 TGGGGAAGAG ACAACAACTC CCTCTCAGAA GCAGGAGCCG ATAGACAAGG AACTGTATCC 
1501 TTTAGCTTCC CTCAGATCAC TCTTTGGCAG CGACCCCTCG TCACAATAAA GATAGGGGGG 
1561 CAGCTCAAGG AGGCTCTCCT GGACACCGGA GCAGACGACA CCGTGCTGGA GGAGATGTCG 
1621 TTGCCAGGCC GCTGGAAGCC GAAGATGATC GGGGGAATCG GCGGTTTCAT CAAGGTGCGC 
1681 CAGTATGACC AGATCCTCAT CGAAATCTGC GGCCACAAGG CTATCGGTAC CGTGCTGGTG 
1741 GGCCCCACAC CCGTCAACAT CATCGGACGC AACCTGTTGA CGCAGATCGG TTGCACGCTG 
1801 AACTTCCCCA TTAGCCCTAT CGAGACGGTA CCGGTGAAGC TGAAGCCCGG GATGGACGGC 
1861 CCGAAGGTCA AGCAATGGCC ATTGACAGAG GAGAAGATCA AGGCACTGGT GGAGATTTGC 
1921 AGAGAGATGG AAAAGGAAGG GAAAATCTCC AAGATTGGGC CTGAGAACCC GTACAACACG 
1981 CCGGTGTTCG CAATCAAGAA GAAGGACTCG ACGAAATGGC GCAAGCTGGT GGACTTCCGC 
2041 GAGCTGAACA AGCGCACGCA AGACTTCTGG GAGGTTCAGC TGGGCATCCC GCACCCCGCA 
2101 GGGCTGAAGA AGAAGAAATC CGTGACCGTA CTGGATGTGG GTGATGCCTA CTTCTCCGTT 
2161 CCCCTGGACG AAGACTTCAG GAAGTACACT GCCTTCACAA TCCCTTCGAT CAACAACGAG 
2221 ACACCGGGGA TTCGATATCA GTACAACGTG CTGCCCCAGG GCTGGAAAGG CTCTCCCGCA 
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2281 ATCTTCCAGA GTAGCATGAC CAAAATCCTG GAGCCTTTCC GCAAACAGAA CCCCGACATC 
2341 GTCATCTATC AGTACATGGA TGACTTGTAC GTGGGCTCTG ATCTAGAGAT AGGGCAGCAC 
24 01 CGCACCAAGA TCGAGGAGCT GCGCCAGCAC CTGTTGAGGT GGGGACTGAC CACACCCGAC 
2461 AAGAAGCACC AGAAGGAGCC TCCCTTCCTC TGGATGGGTT ACGAGCTGCA CCCTGACAAA 
2521 TGGACCGTGC AGCCTATCGT GCTGCCAGAG AAAGACAGCT GGACTGTCAA CGACATACAG 

2 581 AAGCTGGTGG GGAAGTTGAA CTGGGCCAGT CAGATTTACC CAGGGATTAA GGTGAGGCAG 
2641 CTGTGCAAAC TCCTCCGCGG AACCAAGGCA CTCACAGAGG TGATCCCCCT AACCGAGGAG 
27 01 GCCGAGCTCG AACTGGCAGA AAACCGAGAG ATCCTAAAGG AGCCCGTGCA CGGCGTGTAC 
2761 TATGACCCCT CCAAGGACCT GATCGCCGAG ATCCAGAAGC AGGGGCAAGG CCAGTGGACC 
2821 TATCAGATTT ACCAGGAGCC CTTCAAGAAC CTGAAGACCG GCAAGTACGC CCGGATGAGG 
2881 GGTGCCCACA CTAACGACGT CAAGCAGCTG ACCGAGGCCG TGCAGAAGAT CACCACCGAA 
2941 AGCATCGTGA TCTGGGGAAA GACTCCTAAG TTCAAGCTGC CCATCCAGAA GGAAACCTGG 
3001 GAAACCTGGT GGACAGAGTA TTGGCAGGCC ACCTGGATTC CTGAGTGGGA GTTCGTCAAC 
3061 ACCCCTCCCC TGGTGAAGCT GTGGTACCAG CTGGAGAAGG AGCCCATAGT GGGCGCCGAA 
3121 ACCTTCTACG TGGATGGGGC CGCTAACAGG GAGACTAAGC TGGGCAAAGC CGGATACGTC 
3181 ACTAACCGGG GCAGACAGAA GGTTGTCACC CTCACTGACA CCACCAACCA GAAGACTGAG 
3241 CTGCAGGCCA TTTACCTCGC TTTGCAGGAC TCGGGCCTGG AGGTGAACAT CGTGACAGAC 
33 01 TCTCAGTATG CCCTGGGCAT CATTCAAGCC CAGCCAGACC AGAGTGAGTC CGAGCTGGTC 

3 3 61 AATCAGATCA TCGAGGAGCT GATCAAGAAG GAAAAGGTCT ATCTGGCCTG GGTACCCGCC 
3421 CACAAAGGCA TTGGCGGCAA TGAGCAGGTC GACAAGCTGG TCTCGGCTGG CATCAGGAAG 
3481 GTGCTATTCC TGGATGGCAT CGACAAGGCC CAGGACGAGC ACGAGAAATA CCACAGCAAC 
3 541 TGGCGGGCCA TGGCTAGCGA CTTCAACCTG CCCCCTGTGG TGGCCAAAGA GATCGTGGCC 
3601 AGCTGTGACA AGTGTCAGCT CAAGGGCGAA GCCATGCATG GCCAGGTGGA CTGTAGCCCC 
3 661 GGCATCTGGC AACTCGATTG CACCCATCTG GAGGGCAAGG TTATCCTGGT AGCCGTCCAT 
3 721 GTGGCCAGTG GCTACATCGA GGCCGAGGTC ATTCCCGCCG AAACAGGGCA GGAGACAGCC 
3 781 TACTTCCTCC TGAAGCTGGC AGGCCGGTGG CCAGTGAAGA CCATCCATAC TGACAATGGC 
3 841 AGCAATTTCA CCAGTGCTAC GGTTAAGGCC GCCTGCTGGT GGGCGGGAAT CAAGCAGGAG 
3 901 TTCGGGATCC CCTACAATCC CCAGAGTCAG GGCGTCGTCG AGTCTATGAA TAAGGAGTTA 
3 961 AAGAAGATTA TCGGCCAGGT CAGAGATCAG GCTGAGCATC TCAAGACCGC GGTCCAAATG 
4021 GCGGTATTCA TCCACAATTT CAAGCGGAAG GGGGGGATTG GGGGGTACAG TGCGGGGGAG 
4081 CGGATCGTGG ACATCATCGC GACCGACATC CAGACTAAGG AGCTGCAAAA GCAGATTACC 
4141 AAGATTCAGA ATTTCCGGGT CTACTACAGG GACAGCAGAA ATCCCCTCTG GAAAGGCCCA 
42 01 GCGAAGCTCC TCTGGAAGGG TGAGGGGGCA GTAGTGATCC AGGATAATAG CGACATCAAG 
42 61 GTGGTGCCCA GAAGAAAGGC GAAGATCATT AGGGATTATG GCAAACAGAT GGCGGGTGAT 
4321 GATTGCGTGG CGAGCAGACA GGATGAGGAT TAG 

SEQ. I.D. NO. 14 - pSYNGP4 - codon optimised HIV-1 gagpol with 20 bp of the leader 
sequence of HTV-l, upstream of the start codon of ATG, 

1 CGGAGGCTAG AAGGAGAGAG ATGGGCGCCC GCGCCAGCGT GCTGTCGGGC GGCGAGCTGG 
61 ACCGCTGGGA GAAGATCCGC CTGCGCCCCG GCGGCAAAAA GAAGTACAAG CTGAAGCACA 
121 TCGTGTGGGC CAGCCGCGAA CTGGAGCGCT TCGCCGTGAA CCCCGGGCTC CTGGAGACCA 
181 GCGAGGGGTG CCGCCAGATC CTCGGCCAAC TGCAGCCCAG CCTGCAAACC GGCAGCGAGG 
241 AGCTGCGCAG CCTGTACAAC ACCGTGGCCA CGCTGTACTG CGTCCACCAG CGCATCGAAA 
3 01 TCAAGGATAC GAAAGAGGCC CTGGATAAAA TCGAAGAGQA ACAGAATAAG AGCAAAAAGA 
361 AGGCCCAACA GGCCGCCGCG GACACCGGAC ACAGCAACCA GGTCAGCCAG AACTACCCCA 
421 TCGTGCAGAA CATCCAGGGG CAGATGGTGC ACCAGGCCAT CTCCCCCCGC ACGCTGAACG 
481 CCTGGGTGAA GGTGGTGGAA GAGAAGGCTT TTAGCCCGGA GGTGATACCC ATGTTCTCAG 
541 CCCTGTCAGA GGGAGCCACC CCCCAAGATC TGAACACCAT GCTCAACACA GTGGGGGGAC 
601 ACCAGGCCGC CATGCAGATG CTGAAGGAGA CCATCAATGA GGAGGCTGCC GAATGGGATC 
661 GTGTGCATCC GGTGCACGCA GGGCCCATCG CACCGGGCCA GATGCGTGAG CCACGGGGCT 
721 CAGACATCGC CGGAACGACT AGTACCCTTC AGGAACAGAT CGGCTGGATG ACCAACAACC 
781 CACCCATCCC GGTGGGAGAA ATCTACAAAC GCTGGATCAT CCTGGGCCTG AACAAGATCG 
841 TGCGCATGTA TAGCCCTACC AGCATCCTGG ACATCCGCCA AGGCCCGAAG GAACCCTTTC 
901 GCGACTACGT GGACCGGTTC TACAAAACGC TCCGCGCCGA GCAGGCTAGC CAGGAGGTGA 
961 AGAACTGGAT GACCGAAACC CTGCTGGTCC AGAACGCGAA CCCGGACTGC AAGACGATCC 
1021 TGAAGGCCCT GGGCCCAGCG GCTACCCTAG AGGAAATGAT GACCGCCTGT CAGGGAGTGG 
1081 GCGGACCCGG CCACAAGGCA CGCGTCCTGG CTGAGGCCAT GAGCCAGGTG ACCAACTCCG 
1141 CTACCATCAT GATGCAGCGC GGCAACTTTC GGAACCAACG CAAGATCGTC AAGTGCTTCA 
1201 ACTGTGGCAA AGAAGGGCAC ACAGCCCGCA ACTGCAGGGC CCCTAGGAAA AAGGGCTGTT 
1261 GGAAATGTGG AAAGGAAGGA CACCAAATGA AAGATTGTAC TGAGAGACAG GCTAATTTTT 
1321 TAGGGAAGAT CTGGCCTTCC CACAAGGGAA GGCCAGGGAA TTTTCTTCAG AGCAGACCAG 
1381 AGCCAACAGC CCCACCAGAA GAGAGCTTCA GGTTTGGGGA AGAGACAACA ACTCCCTCTC 
1441 AGAAGCAGGA GCCGATAGAC AAGGAACTGT ATCCTTTAGC TTCCCTCAGA TCACTCTTTG 
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1501 GCAGCGACCC CTCGTCACAA TAAAGATAGG GGGGCAGCTC AAGGAGGCTC TCCTGGACAC 
1561 CGGAGCAGAC GACACCGTGC TGGAGGAGAT GTCGTTGCCA GGCCGCTGGA AGCCGAAGAT 

1621 GATCGGGGGA ATCGGCGGTT TCATCAAGGT GCGCCAGTAT GACCAGATCC TCATCGAAAT 

1681 CTGCGGCCAC AAGGCTATCG GTACCGTGCT GGTGGGCCCC ACACCCGTCA ACATCATCGG 

1741 ACGCAACCTG TTGACGCAGA TCGGTTGCAC GCTGAACTTC CCCATTAGCC CTATCGAGAC 

1801 GGTACCGGTG AAGCTGAAGC CCGGGATGGA CGGCCCGAAG GTCAAGCAAT GGCCATTGAC 

1861 AGAGGAGAAG ATCAAGGCAC TGGTGGAGAT TTGCACAGAG ATGGAAAAGG AAGGGAAAAT 

1921 CTCCAAGATT GGGCCTGAGA ACCCGTACAA CACGCCGGTG TTCGCAATCA AGAAGAAGGA 

1981 CTCGACGAAA TGGCGCAAGC TGGTGGACTT CCGCGAGCTG AACAAGCGCA CGCAAGACTT 

2041 CTGGGAGGTT CAGCTGGGCA TCCCGCACCC CGCAGGGCTG AAGAAGAAGA AATCCGTGAC 

2101 CGTACTGGAT GTGGGTGATG CCTACTTCTC CGTTCCCCTG GACGAAGACT TCAGGAAGTA 

2161 CACTGCCTTC ACAATCCCTT CGATCAACAA CGAGACACCG GGGATTCGAT ATCAGTACAA 

2221 CGTGCTGCCC CAGGGCTGGA AAGGCTCTCC CGCAATCTTC CAGAGTAGCA TGACCAAAAT 

22 81 CCTGGAGCCT TTCCGCAAAC AGAACCCCGA CATCGTCATC TATCAGTACA TGGATGACTT 

2341 GTACGTGGGC TCTGATCTAG AGATAGGGCA GCACCGCACC AAGATCGAGG AGCTGCGCCA 

24 01 GCACCTGTTG AGGTGGGGAC TGACCACACC CGACAAGAAG CACCAGAAGG AGCCTCCCTT 

2461 CCTCTGGATG GGTTACGAGC TGCACCCTGA CAAATGGACC GTGCAGCCTA TCGTGCTGCC 

2521 AGAGAAAGAC AGCTGGACTG TCAACGACAT ACAGAAGCTG GTGGGGAAGT TGAACTGGGC 

2 581 CAGTCAGATT TACCCAGGGA TTAAGGTGAG GCAGCTGTGC AAACTCCTCC GCGGAACCAA 

2641 GGCACTCACA GAGGTGATCC CCCTAACCGA GGAGGCCGAG CTCGAACTGG CAGAAAACCG 

2701 AGAGATCCTA AAGGAGCCCG TGCACGGCGT GTACTATGAC CCCTCCAAGG ACCTGATCGC 

2 761 CGAGATCCAG AAGCAGGGGC AAGGCCAGTG GACCTATCAG ATTTACCAGG AGCCCTTCAA 
2821 GAACCTGAAG ACCGGCAAGT ACGCCCGGAT GAGGGGTGCC CACACTAACG ACGTCAAGCA 
2881 GCTGACCGAG GCCGTGCAGA AGATCACCAC CGAAAGCATC GTGATCTGGG GAAAGACTCC 
2941 TAAGTTCAAG CTGCCCATCC AGAAGGAAAC CTGGGAAACC TGGTGGACAG AGTATTGGCA 
3001 GGCCACCTGG ATTCCTGAGT GGGAGTTCGT CAACACCCCT CCCCTGGTGA AGCTGTGGTA 

3 061 CCAGCTGGAG AAGGAGCCCA TAGTGGGCGC CGAAACCTTC TACGTGGATG GGGCCGCTAA 
3121 CAGGGAGACT AAGCTGGGCA AAGCCGGATA CGTCACTAAC CGGGGCAGAC AGAAGGTTGT 
3181 CACCCTCACT GACACCACCA ACCAGAAGAC TGAGCTGCAG GCCATTTACC TCGCTTTGCA 
3241 GGACTCGGGC CTGGAGGTGA ACATCGTGAC AGACTCTCAG TATGCCCTGG GCATCATTCA 
3 3 01 AGCCCAGCCA GACCAGAGTG AGTCCGAGCT GGTCAATCAG ATCATCGAGC AGCTGATCAA 
3 3 61 GAAGGAAAAG GTCTATCTGG CCTGGGTACC CGCCCACAAA GGCATTGGCG GCAATGAGCA 
3421 GGTCGACAAG CTGGTCTCGG CTGGCATCAG GAAGGTGCTA TTCCTGGATG GCATCGACAA 
3481 GGCCCAGGAC GAGCACGAGA AATACCACAG CAACTGGCGG GCCATGGCTA GCGACTTCAA 
3 541 CCTGCCCCCT GTGGTGGCCA AAGAGATCGT GGCCAGCTGT GACAAGTGTC AGCTCAAGGG 
3601 CGAAGCCATG CATGGCCAGG TGGACTGTAG CCCCGGCATC TGGCAACTCG ATTGCACCCA 
3661 TCTGGAGGGC AAGGTTATCC TGGTAGCCGT CCATGTGGCC AGTGGCTACA TCGAGGCCGA 
3 721 GGTCATTCCC GCCGAAACAG GGCAGGAGAC AGCCTACTTC CTCCTGAAGC TGGCAGGCCG 
3 781 GTGGCCAGTG AAGACCATCC ATACTGACAA TGGCAGCAAT TTCACCAGTG CTACGGTTAA 
3 841 GGCCGCCTGC TGGTGGGCGG GAATCAAGCA GGAGTTCGGG ATCCCCTACA ATCCCCAGAG 
3 901 TCAGGGCGTC GTCGAGTCTA TGAATAAGGA GTTAAAGAAG ATTATCGGCC AGGTCAGAGA 

3 961 TCAGGCTGAG CATCTCAAGA CCGCGGTCCA AATGGCGGTA TTCATCCACA ATTTCAAGCG 
4021 GAAGGGGGGG ATTGGGGGGT ACAGTGCGGG GGAGCGGATC GTGGACATCA TCGCGACCGA 

4 081 CATCCAGACT AAGGAGCTGC AAAAGCAGAT TACCAAGATT CAGAATTTCC GGGTCTACTA 
4141 CAGGGACAGC AGAAATCCCC TCTGGAAAGG CCCAGCGAAG CTCCTCTGGA AGGGTGAGGG 
42 01 GGCAGTAGTG ATCCAGGATA ATAGCGACAT CAAGGTGGTG CCCAGAAGAA AGGCGAAGAT 
4261 CATTAGGGAT TATGGCAAAC AGATGGCGGG TGATGATTGC GTGGCGAGCA GACAGGATGA 
4321 GGATTAG 
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